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ABSTRACT 


The  purpose  of  this  study  was  to 
evaluate  the  feasibility  of  validating 
ASVAB  enlistment  standards  against  job 
performance.  Hands-on  and  written  pro¬ 
ficiency  tests  were  developed  for  three 
Marine  Corps  skills — Ground  Radio  Re¬ 
pair,  Automotive  Mechanic,  and  Infantry 
Rifleman — for  use  as  measures  of  job 
performance.  In  addition,  grades  in 
skill  training  courses  were  also  evalu¬ 
ated  as  possible  measures  of  job 
performance . 

The  ASVAB  was  shown  to  be  a  valid 
predictor  of  job  performance.  All  three 
measures — hands-on  tests,  written  tests, 
and  training  grades — were  generally  con¬ 
sistent  measures  of  performance.  A  pre¬ 
liminary  set  of  ASVAB  qualification 
standards  for  assigning  recruits  to 
these  three  skills  was  computed  using 
the  hand's-on  and  written  tests  as  the 
criterion  measure.  The  ASVAB  standards 
derived  from  this  analysis  are  similar 
to  the  standards  based  on  the  tradi¬ 
tional  criterion  measure  of  training- 
course  grades .  We  conclude  that  vali¬ 
dating  ASVAB  enlistment  standards 
against  job  performance  appears  to  be 
feasible.  Although  job  performance  tests 
can  be  used  for  this  purpose,  they  are 
costly  to  develop  and  administer.  Train¬ 
ing  grades,  which  are  routinely  avail¬ 
able,  may  serve  as  a  satisfactory  and 
economical  proxy  for  them  in  many 
skills . 


EXECUTIVE  SUMMARY 


INTRODUCTION 

Each  year  the  military  services  test  approximately  one  million 
applicants  for  enlistment,  and  of  these  about  one  quarter  fail  to  meet 
the  mental  standards.  Mental  standards  are  defined  in  terms  of  educa¬ 
tional  level  (standards  for  high  school  graduates  are  lower  than  for 
nongraduates)  and  scores  on  the  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB). 

Questions  about  the  appropriateness  of  mental  standards  have  arisen 
because  of  problems  with  ASVAB  scores.  In  the  late  1970s  ASVAB  scores 
were  seriously  inflated  because  of  an  error  in  calibrating  the  test.  As 
a  result,  the  standards  were  inadvertently  lowered,  and  the  services 
enlisted  many  people  who  would  have  failed  to  qualify  if  the  ASVAB 
scores  as  reported  had  accurately  measured  mental  aptitudes.  When  the 
problems  with  the  scores  became  widely  known,  the  Congress  and  Defense 
personnel  managers  wanted  to  know  the  effects  of  the  inflated  ASVAB 
scores  on  job  performance.  In  effect,  the  question  was  whether  the 
influx  of  people  who  should  have  failed  to  qualify  seriously  affected 
job  performance.  When  the  personnel  managers  turned  to  the  ASVAB 
research  analysts  for  answers,  they  found  that  whereas  the  ASVAB  was 
known  to  be  a  valid  predictor  of  grades  in  training  courses,  not  much 
was  known  about  the  relationship  between  the  ASVAB  and  job  performance. 

A  large  joint-service  research  program  was  then  initiated  to  deter¬ 
mine  whether  enlistment  standards  could  be  validated  against  job  per¬ 
formance.  The  research  task  is  to  develop  measures  of  job  performance 
and  to  determine  how  well  the  ASVAB  predicts  scores  on  those  measures. 

If  the  research  demonstrates  that  the  ASVAB  predicts  job  performance, 
then  enlistment  standards  can  be  validated  against  job  performance. 

PURPOSE 

The  purpose  of  this  study  was  to  evaluate  the  feasibility  of 
validating  the  ASVAB  against  measures  of  job  performance.  The  objec¬ 
tives  of  the  study  were  to  determine: 

•  The  ability  of  the  ASVAB  to  predict  job  performance 

•  The  relationship  between  job  performance  tests,  which  are 
expensive  to  develop  and  administer,  and  other  indicators 
of  performance  that  are  less  expensive  to  obtain,  notably 
training  grades 

•  ASVAB  qualification  standards  that  would  result  from  using 
measures  of  job  performance  as  the  criteria  for  validating 
the  ASVAB. 
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The  benchmark  measures  of  job  performance  in  the  joint-service 
research  program  are  job-sample  tests  that  involve  hands-on  performance 
of  tasks  representative  of  all  the  important  tasks  in  a  job.  Other 
measures  or  indicators,  such  as  written  tests  of  job  skills  and  knowl¬ 
edge  and  training  grades,  are  evaluated  by  their  degree  of  relationship 
to  the  benchmark  hands-on  tests.  To  the  extent  these  proxy  measures  are 
related  to  the  hands-on  tests,  they  can  be  used  to  supplement  or  serve 
as  substitutes  for  the  costly  hands-on  tests. 

PROCEDURES 

Three  representative  Marine  Corps  job  skills  were  selected:  Ground 
Radio  Repair,  Automotive  Mechanic,  and  Infantry  Rifleman.  These  skills 
vary  widely  in  their  job  requirements.  The  Ground  Radio  Repair  spe¬ 
cialty  has  high  technical  demands  (37  weeks  of  formal  school  training), 
Automotive  Mechanic  has  moderate  demands  (13  weeks  of  training),  and 
Infantry  Rifleman  has  'relatively  low  technical  demands  (5  weeks  of 
training).  For  each  specialty,  Marine  Corps  job  experts,  assisted  by 
testing  psychologists,  developed  a  hands-on  test  and  a  written  test. 

The  tests  were  administered  by  the  Marine  Corps  to  people  in  each  spe¬ 
cialty.  Training  course  grades,  routinely  available  in  the  Marine 
Corps,  were  also  collected. 

RESULTS 

ASVAB  as  a  Predictor  of  Job  Performance 


The  primary  objective  of  this  study  was  to  evaluate  how  accurately 
the  ASVAB  predicts  job  performance.  If  the  ASVAB  is  an  accurate  pre¬ 
dictor,  it  can  be  used  confidently  to  set  mental  standards.  The  ASVAB 
did  prove  to  be  a  valid  predictor  of  hands-on  performance  tests  in  all 
three  skills.  The  validity  of  the  relevant  ASVAB  aptitude  composite  for 
each  specialty  is  shown  in  table  I.  The  validity  coefficients  are  close 
to  .6.  The  percent  of  satisfactory  performers  in  10-point  intervals  of 
ASVAB  aptitude  composite  scores  is  shown  in  figure  I. 

Relationship  of  Proxy  Measures  to  Hands-on  Tests 

The  second  objective  was  to  evaluate  proxy  measures  of  performance 
(written  tests  and  training  grades)  in  terms  of  their  relationship  to 
the  benchmark  hands-on  job  performance  tests.  The  correlation  of  the 
proxy  measures  with  hands-on  tests  is  shown  in  table  II.  For  the  two 
technical  skills  (Ground  Radio  Repair  and  Automotive  Mechanic),  the 
written  tests  and  training  grades  show  promise  as  substitutes  for  the 
hands-on  tests.  For  the  Infantry  Rifleman  skill,  the  written  test  shows 
promise  as  a  substitute  for  the  hands-on  test,  but  because  of  lower 
correlation  with  the  hands-on  test,  training  grades  show  less  promise. 
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TABLE  I 


VALIDITY  OF  THE  ASVAB  AS  A  PREDICTOR  OF  JOB  PERFORMANCE 


Skill 


Validity^ 

coefficient 


Ground  Radio  Repair 
Automotive  Mechanic 
Infantry  Rifleman 


.59 

.56 

.58 


aValidity  of  appropriate  ASVAB  aptitude  composite  for 
predicting  hands-on  job  performance  test  scores. 


ASVAB  aptitude  composite  score 


FIG.  I:  PERCENT  SATISFACTORY  PERFORMERS  ON  JOB 
PERFORMANCE  TESTS  BY  ASVAB  APTITUDE  COMPOSITE 
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The  validity  of  the  appropriate  ASVAB  composites  for  predicting  the 
proxy  measures  is  also  shown  in  table  II.  Except  for  training  grades  in 
the  Infantry  Rifleman  skill,  the  ASVAB  validity  coefficients  are  high 
(.65  or  higher). 


TABLE  II 

CORRELATION  OF  PROXY  MEASURES  OF  JOB  PERFORMANCE 
WITH  HANDS-ON  TESTS  AND  THE  ASVAB 

Correlation  with: 


Skill 

Proxy  measure 

Hands-on 
performance  test 

ASVAB  aptitude 
composite3 

Ground 

Written  test 

.51 

.73 

Radio  Repair 

Training  grades 

.52 

.75 

Automotive 

Written  test 

.45 

.65 

Mechanic 

Training  grades 

.51 

.83 

Infantry 

Written  test 

.56 

.69 

Rifleman 

Training  grades 

.39 

.29 

Correlation  with  appropriate  aptitude  composite. 


Qualification  Standards 


The  third  objective  was  to  evaluate  the  ASVAB  qualification  stan¬ 
dards  that  would  result  from  using  job  performance  as  the  criterion  for 
validating  the  ASVAB.  Three  pieces  of  information  were  required  for 
this  preliminary  evaluation: 

•  Assumptions  about  the  percent  of  the  total  population  that 
would  be  satisfactory  performers  in  the  skills 

•  Acceptable  rate  of  unsatisfactory  performance  in  the  skill 
among  those  qualified  on  the  ASVAB 

•  Predictive  validity  of  the  ASVAB. 

Assumptions  About  the  Percent  of  Satisfactory  Performers 


Based  on  the  experience  of  the  military  services  and  civilian  world 
of  work,  we  assumed  that  50  percent  of  the  population  would  be  satis¬ 
factory  radio  repairers,  70  percent  would  be  satisfactory  automotive 
mechanics,  and  80  percent  would  be  satisfactory  infantry  riflemen. 

These  percentages  reflect  the  relative  difficulty  of  the  skills. 
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Acceptable  Rate  of  Unsatisfactory  Performance 


The  second  piece  of  information  reflects  the  cost  that  the  Marine 
Corps,  or  any  employer,  is  willing  to  bear  to  train  or  keep  people  on 
the  payroll  who  are  unsatisfactory  performers.  We  assumed  the  Marine 
Corps  would  tolerate  a  failure  rate  of  10  percent  (either  in  training  or 
on  the  job  or  some  combination  of  the  two) . 

Predictive  Validity  of  the  ASVAB 


The  third  piece  of  information  is  the  predictive  validity  of  the 
ASVAB  in  the  full  population.  We  used  a  combination  of  hands-on  and 
written  proficiency  tests  as  the  criterion  measures  of  performance 
because  both  have  content  validity. 

Qualification  Standards  Derived  in  This  Study 

Table  III  shows  the  qualification  standards  on  the  appropriate 
aptitude  composites  that  were  derived  in  this  study.  The  similarity  of 
these  qualifying  scores  to  those  currently  used  supports  the  reasonable¬ 
ness  of  existing  ASVAB  qualification  standards  based  on  the  traditional 
criterion  measure  of  grades  in  skill  training  courses. 


TABLE  III 

ASVAB  QUALIFICATION  STANDARDS 

Qualification  standardsa 


Skill 

Existing 

Derived 

Ground  Radio  Repair 

115 

115 

Automotive  Mechanic 

90 

95 

Infantry  Rifleman 

80 

85 

aExisting  standards  are  for 
standards  were  estimated  in 

high  school  graduates; 
this  study. 

derived 

FUTURE  RESEARCH 

The  usefulness  of  the  ASVAB  for  selecting  and  classifying  recruits 
is  supported  by  this  study.  The  close  correspondence  of  ASVAB  quali¬ 
fication  standards  based  on  the  hands-on  and  written  proficiency  tests 
with  the  traditional  standards,  based  on  training  grades  as  the  perfor¬ 
mance  measures,  should  serve  to  increase  confidence  in  using  the  ASVAB 
for  selecting  and  classifying  recruits. 
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Additional  research  is  required  to  establish  more  firmly  the  credi¬ 
bility  of  training  grades  as  performance  measures  for  validating  ASVAB. 
If  training  grades  are  found  to  have  adequate  content  validity  across  a 
broad  range  of  military  skills,  then  the  ASVAB  can  continue  to  be  vali¬ 
dated  against  them.  They  have  the  advantage  that  they  are  readily 
available  for  virtually  all  recruits,  in  contrast  to  the  job  performance 
measures,  which  are  expensive  to  develop  and  administer.  The  cost  to 
develop,  administer,  and  analyze  each  of  the  job  performance  measures  in 
this  study  was  approximately  $360,000.  This  cost  is  minimal  because 
this  effort  was  a  feasibility  study.  In  more  definitive  studies,  the 
development  of  the  performance  measures  will  be  more  systematic,  and  the 
costs  will  be  considerably  higher.  For  skills  in  which  training  grades 
do  not  have  content  validity,  hands-on  or  written  proficiency  tests  may 
need  to  be  developed.  The  joint-service  research  program  to  validate 
the  ASVAB  against  job  performance  is  addressing  the  credibility  of  proxy 
performance  measures. 

CONCLUSIONS 

•  The  ASVAB  is  a  valid  predictor  of  job  performance. 

•  Enlisted  qualifying  standards  can  be  validated  against  job 
performance. 

•  Qualifying  standards  derived  by  using  job  performance  as 
the  criterion  measure  are  similar  to  current  Marine  Corps 
standards. 

•  In  technical  skills,  training  grades  that  have  been 
routinely  available  for  recruits,  and  therefore  are  an 
economical  criterion  measure,  appear  to  be  about  as 
satisfactory  as  job  performance  tests  for  validating 
qualification  standards. 

•  For  nontechnical  skills,  job  performance  measures  may  need 
to  be  developed  for  validating  qualification  standards. 

RECOMMENDATIONS 

•  Additional  Marine  Corps  jobs  should  be  examined  to 
determine  if  the  conclusions  in  this  report  can  be 
generalized. 

•  Numerical  grades  in  job  training  courses,  rather  than 
simple  pass/fail  grades,  should  be  routinely  recorded  and 
retained  for  use  as  criterion  measures  in  future  research 
efforts  to  validate  the  ASVAB. 
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CHAPTER  1 


EVALUATION  OF  PERFORMANCE 


INTRODUCTION 

When  forms  5,  6,  and  7  of  the  Armed  Services  Vocational  Battery 
( ASVAB  5/6/7)  were  introduced  on  1  January  1976,  enlistment  standards 
were  inadvertently  lowered.  The  score  scale  for  ASVAB  5/6/7  was 
inflated  compared  to  the  traditional  meaning  of  ASVAB  scores.  ASVAB 
5/6/7  was  used  through  September  1980.  During  that  time,  about  25 
percent  of  the  recruit  accessions  would  not  have  qualified  for 
enlistment  if  the  tests  had  been  accurately  calibrated  to  the 
traditional  ASVAB  score  scale. 

New  versions  of  the  ASVAB,  forms  8,  9,  and  10  (ASVAB  8/9/10),  were 
introduced  on  1  October  1980.  Because  ASVAB  8/9/10  was  accurately 
calibrated  to  the  traditional  score  scale,  enlistment  standards  would 
have  been  higher  if  the  same  nominal  standards  used  with  ASVAB  5/6/7  had 
remained  in  effect.  When  ASVAB  8/9/10  was  introduced,  all  services 
except  the  Marine  Corps  lowered  enlistment  standards  to  about  the  same 
level  that  the  actual  standards  had  been  with  ASVAB  5/6/7.  Thus,  by 
maintaining  the  same  nominal  standards,  the  Marine  Corps  in  effect 
raised  the  minimum  qualifying  scores  for  enlistment. 

While  ASVAB  8/9/10  was  being  prepared  for  operational  use, 
personnel  managers  in  the  Department  of  Defense  (DoD)  became  concerned 
about  what  the  enlistment  standards  ought  to  be.  The  intent  of 
enlistment  standards  is  to  prevent  potential  unsatisfactory  performers 
from  entering  the  service.  DoD  personnel  managers  wanted  to  know  how 
well  the  ASVAB  identifies  applicants  for  enlistment  who  would  have 
unsatisfactory  levels  of  performance  in  their  military  jobs. 

The  personnel  managers  turned  to  the  ASVAB  research  community  for 
information  about  the  relationship  between  ASVAB  scores  and  job 
performance.  The  ASVAB  and  previous  versions  of  military  selection  and 
classification  test  batteries  have  been  extensively  validated  as 
predictors  of  success  in  skill  training  courses,  but  there  have  been  no 
large-scale  efforts  to  relate  ASVAB  scores  to  job  performance.  Because 
success  in  skill  training  courses  has  not  been  systematically  related  to 
job  performance,  the  relationship  between  ASVAB  and  job  performance 
remains  questionable.  The  research  community  could  not  document  the 
ASVAB  as  a  valid  predictor  of  performance  on  the  job.  As  a  result,  the 
Office  of  the  Assistant  Secretary  of  Defense  for  Manpower,  Reserve 
Affairs,  and  Logistics  (ASD(MRA&L))  requested  each  service  to  validate 
ASVAB-related  enlistment  standards  against  performance  on  the  job. 

In  this  chapter,  we  discuss  some  of  the  issues  in  measuring  job 
performance  and  in  defining  the  content  of  performance  measures.  We 
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then  describe  the  research  design  for  evaluating  the  credibility  of  the 
performance  measures  and  the  validity  of  the  ASVAB  as  a  predictor  of  job 
performance.  This  chapter  is  longer  than  a  customary  chapter  of  intro¬ 
duction  for  several  reasons.  First,  measures  used  and  procedures 
followed  to  validate  ASVAB  need  to  be  explained  in  detail  because  the 
exercise  of  validating  is  not  carried  out  often  on  a  regular  basis. 
Second,  developing  and  administering  tests  is  a  complex  process,  which 
also  needs  to  be  explained  in  detail.  And,  third,  to  set  the  tone  for 
the  work  that  follows  we  need  to  explain  some  of  the  pitfalls  in 
constructing  and  analyzing  performance  measures. 

MEASUREMENT  OF  JOB  PERFORMANCE 

A  major  reason  the  ASVAB  has  not  been  systematically  validated 
against  job  performance  is  that  measuring  performance  on  the  job  is 
inherently  difficult  and  expensive.  Until  recently  the  services,  as  is 
true  for  most  employers,  were  unwilling  to  fund  the  cost  of  developing 
and  administering  measures  of  job  performance.  In  part  because  of  the 
problem  with  the  inflated  ASVAB  scores  and  the  ensuing  concern  about 
enlistment  standards,  the  services  are  currently  willing  to  explore  the 
feasibility  of  validating  ASVAB  enlistment  standards  against  job 
performance . 

Performance  and  Proficiency 


On  the  surface,  job  performance  appears  to  be  a  simple  concept  that 
is  readily  observable  and  quantifiable;  people  are  performing  in  their 
jobs  or  skills,  and  the  level  of  performance  should,  theoretically,  be 
readily  ascertainable.  In  practice,  records  of  performance  by  indi¬ 
vidual  workers  usually  are  not  available,  or  if  they  are,  the  entries 
are  not  reliable.  Furthermore,  the  definition  of  the  term  performance 
is  itself  not  precise. 

Performance  is  frequently  thought  to  be  identical  to  proficiency. 
Proficiency,  as  generally  used  in  DoD,  refers  to  competence — ability  to 
perform  job  tasks;  proficiency  tests  measure  the  skills  and  knowledge 
required  to  perform  job  tasks.  Level  of  proficiency  typically  is  mea¬ 
sured  in  an  explicit  testing  environment,  using  instruments  specifically 
developed  to  measure  competence  on  a  set  of  job  skills  and  knowledge. 

The  examinees  know  they  are  being  tested,  and  the  scores  reflect  compe¬ 
tence  as  demonstrated  in  a  testing  environment  rather  than  typical  per¬ 
formance  in  the  natural  job  environment.  Performance  in  DoD  usage  may 
refer  to  competence  as  demonstrated  on  proficiency  tests,  or  it  may 
refer  to  how  well  a  person  typically  performs  in  the  natural  job 
environment . 

In  this  report  we  attempt  to  maintain  a  distinction  between  pro¬ 
ficiency  and  performance.  When  referring  to  proficiency,  we  mean 
competence  as  demonstrated  on  explicit  measuring  instruments;  these 
instruments  could  be  administered  as  special  tests  on  the  job  site  or 
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during  a  job  training  course.  In  either  case,  the  examinees  know  they 
are  being  tested  and  evaluated.  The  word  performance,  however,  is  such 
a  general  term  that  we  cannot  use  it  consistently.  Sometimes  it  refers 
to  a  type  of  measurement.  A  performance  test  usually  means  a  job-sample 
test  for  which  the  examinees  actually  perform  a  set  of  job  tasks.  Per¬ 
formance  tests  usually  imply  hands-on  testing,  but  not  always.  Some¬ 
times  performance  is  used  generically  to  encompass  what  workers  do,  such 
as  job  performance.  Because  performance  is  such  a  general  term  and  no 
other  suitable  term  is  available,  the  ambiguity  remains,  and  in  this 
report  the  context  will  help  define  how  we  use  the  word. 

Requirements  of  Performance  Measures 


The  fundamental  requirement  of  job  performance  measures  is  that 
they  should  be  relevant  to  job  requirements.  The  content  of  the 
measures  should  reflect  the  content  of  the  job;  the  closer  the  cor¬ 
respondence,  the  greater  the  "content  validity"  of  the  measure.  Content 
validity  is  determined  by  expert  judgment.  Workers  known  to  be  profi¬ 
cient  in  the  job  evaluate  the  relevance  of  the  measuring  instruments  to 
job  requirements.  Hands-on  proficiency  tests,  in  which  examinees 
perform  tasks  encountered  on  the  job,  have  a  high  degree  of  content 
validity.  In  fact,  hands-on  proficiency  tests  are  the  benchmark 
criterion  by  which  the  job  relevance  of  other  types  of  proficiency  or 
performance  measures  is  evaluated. 

Another  requirement  for  performance  measures  is  that  the  scores 
accurately  reflect  the  level  of  performance  of  the  examinees.  To  the 
extent  that  the  scores  are  accurate,  they  can  be  reproduced  in  other 
testing  situations.  With  hands-on  proficiency  tests,  the  scores  are 
accurate  if  different  test  administrators  would  assign  the  same  scores 
or  if  the  examinee  would  attain  the  same  score  when  retested  on  another 
occasion. 

Hands-on  proficiency  tests  are  the  core  performance  measures. 
Because  they  have  such  a  high  degree  of  content  validity  they  are  the 
basis  for  evaluating  the  job  relevance  of  other  performance  measures. 
Hands-on  tests,  however,  generally  suffer  from  a  lack  of  scoring 
accuracy.  The  rules  for  administering  and  scoring  hands-on  tests  cannot 
be  explicated  with  sufficient  clarity  to  ensure  objective  scoring;  the 
test  administrator  invariably  has  latitude  to  exercise  subjective  judg¬ 
ment  about  what  kinds  of  cues  or  hints  to  provide  examinees  and  what 
standards  to  use  to  determine  satisfactory  and  unsatisfactory  levels  of 
>  performance.  Hands-on  proficiency  tests  consist  of  a  series  of  steps  in 

performing  a  task.  The  administrator  must  decide  whether  the  examinee 
has  accomplished  each  step  properly.  In  spite  of  the  limitations 
arising  from  subjectivity  in  scoring,  the  content  validity  of  hands-on 
tests  still  is  the  overriding  consideration  for  determining  the  job 
relevance  of  other  types  of  performance  measures. 
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Types  of  Performance  Measures 


In  our  analysis,  we  included  three  types  of  performance  measures. 
Two  measures — the  hands-on  and  written  tests — were  constructed 
specifically  to  reflect  job  requirements  in  the  skill.  The  other, 
training  grades,  should  reflect  job  requirements.  Because  they  are 
related  to  the  same  job  requirements,  they  should  be  related  to  each 
other.  Each  of  these  measures  has  advantages  and  disadvantages. 

Hands-on  Proficiency  Tests 


We  have  already  discussed  the  primary  advantage  of  hands-on 
proficiency  tests — their  content  validity.  Their  disadvantages  are  the 
questionable  scoring  accuracy  and,  more  important,  the  high  cost  of 
administering  them.  Hands-on  tests  require  administrators  who  are 
themselves  experts  in  performing  the  tasks,  and  who  can  make  accurate 
decisions  about  the  performance  of  examinees.  Typically,  in  hands-on 
testing  an  administrator  can  test  only  one  examinee  at  a  time.  In 
addition,  the  tests  ordinarily  require  expensive  equipment  be  set  aside. 
Because  of  the  resource  demands  placed  on  field  units  to  provide 
experienced  test  administrators  and  equipment,  the  services  have  been 
reluctant  to  support  large-scale  administrations  of  hands-on  tests. 

Written  Proficiency  Tests 


For  years  some  services  have  used  written  proficiency  tests  to  help 
evaluate  job  competence  of  enlisted  personnel.  Because  of  their  paper- 
and-pencil  format  and  multiple-choice  items,  their  relevance  to  job 
requirements  is  questionable.  Through  careful  preparation,  written 
tests  can  require  examinees  to  demonstrate  many  of  the  skills  and 
knowledge  required  to  perform  job  tasks.  For  example,  written  tests  can 
require  examinees  to  make  the  same  kinds  of  decisions  and  perceptions 
they  are  required  to  make  on  the  job.  Written  tests  can  also  test  only 
trivial  facts  that  experienced  workers  may  know,  but  are  not  required 
for  performance  of  tasks.  The  latter  type  of  test  is  much  easier  to 
construct  and,  unfortunately,  all  too  often  has  been  the  type  con¬ 
structed.  Written  tests  that  focus  on  trivia  and  theory  do  lack  content 
validity,  and  thereby  cast  doubt  on  the  content  validity  of  all  written 
proficiency  tests . 

Written  tests  with  content  validity  are  expensive  to  construct. 

Job  experts  should  provide  the  content,  and  other  job  experts  should 
review  the  test  items  to  make  sure  that  the  items  measure  skills  and 
knowledge  used  on  the  job.  The  test  should  also  be  taken  and  critiqued 
by  representative  workers  to  make  sure  that  the  language  is  suitable. 

The  key  to  content  validity  of  written  tests  is  that  the  examinees  be 
required  to  apply  their  knowledge  and  skills  to  solving  the  same  kinds 
of  problems  they  encounter  when  performing  job  tasks. 
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Training  Course  Grades 


Grades  in  skill  training  courses  have  served  as  the  traditional 
criterion  measures  for  validating  ASVAB  and  previous  military  selection 
and  classification  test  batteries.  Training  grades  had  the  advantages 
that  they  were  routinely  available  for  almost  all  recruits,  and  they 
were  based  on  objective  evaluations  of  performance  in  job  training 
programs.  Their  main  disadvantage  is  questionable  content  validity. 

Just  as  written  proficiency  tests  can  include  trivia  and  unnecessary 
theory,  so  can  training  courses.  Training  courses  are  sometimes 
criticized  for  emphasizing  memory  and  verbal  ability  rather  than 
competence  to  perform  job  tasks. 

The  training  grades  included  in  this  analysis  are  based  on  tradi¬ 
tional  methods  of  instruction.  The  grades  are  based  primarily  on  the 
percentage  of  test  items  answered  correctly,  where  tests  were  adminis¬ 
tered  at  the  end  of  instructional  units  and  at  the  end  of  the  course. 
Because  they  have  the  same  characteristics  as  the  traditional  criterion 
measures  for  validating  ASVAB,  whatever  we  learn  about  their  job 
relevance  in  this  study  should  generalize  to  the  meaning  of  training 
grades  in  prior  validation  efforts. 

Training  courses  in  all  services  are  being  revamped  to  conform  to 
the  Instructional  System  Design  or  Development  (ISD)  model,  and  the 
meaning  of  course  grades  is  changing.  The  core  of  the  ISD  model  is  that 
training  course  content  should  be  based  on  job  requirements.  Normally, 
in  revamped  courses,  the  training  and  testing  to  evaluate  student 
proficiency  both  use  the  hands-on  mode.  Students  practice  performing 
job  tasks,  and  then  they  are  tested  on  how  well  they  perform  the  same 
tasks.  The  training  objectives  are  clearly  specified  in  performance 
terms,  and  typically  student  performance  is  reported  simply  as  pass  or 
fail  (ISD  terminology  is  GO/NO-GO).  Information  about  the  rank  order  of 
students,  such  as  percentage  of  steps  passed  on  the  first  attempt  to 
complete  an  instructional  module,  is  not  reported.  For  validation 
purposes,  the  pass-fail  scoring  is  not  adequate.  Validation  requires 
that  information  about  individual  differences  in  level  of  achievement  be 
available.  Individual  differences  in  the  predictor  scores  are  then 
related  to  individual  differences  in  achievement.  The  higher  the 
relationship,  the  more  valid  the  predictor  test. 

Our  analysis  of  training  course  grades  to  determine  their 
usefulness  as  criterion  measures  of  job  performance  may  not  generalize 
to  the  new  type  of  training  courses.  Because  of  the  changes  in  the 
revamped  courses  following  the  ISD  model,  the  job  relevance  of  the 
grades  could  be  higher  or  lower.  The  content  of  the  course  suggests 
higher  job  relevance,  but  the  pass-fail  scoring  dilutes  the  usefulness 
of  training  grades  as  criterion  measures.  The  relevance  of  both  types 
of  training  grades  to  job  performance  remains  to  be  determined. 
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Ratings  of  Job  Proficiency 


Ratings  by  supervisors  are  the  time-honored  means  for  evaluating 
performance  of  workers.  Most  personnel  decisions  based  on  quality  of 
performance  include  supervisor  ratings.  Because  ratings  are  used  so 
pervasively,  it  is  natural  to  question  why  they  are  not  adequate 
measures  of  performance  for  validating  the  ASVAB  and  enlistment 
standards.  The  answer  rests  on  their  questionable  relevance  to  job 
requirements  and  the  low  accuracy  of  their  scores. 

In  general  ratings  are  subjective  evaluations  that  may  include  a 
component  of  competence  to  perform  job  tasks;  but  they  may  also  reflect 
other  components  such  as  cooperativeness,  personal  appearance,  and 
punctuality.  Rating  scores  tend  to  fluctuate  from  rater  to  rater,  or 
even  from  time  to  time  for  the  same  rater.  Just  as  hands-on  tests 
require  judgment  in  scoring,  so  do  ratings  incorporate  judgments  with 
even  less  precise  rules  for  assigning  scores.  Contrary  to  hands-on 
tests  that  have  high  job  relevance,  ratings  usually  do  not  compensate 
for  their  questionable  scoring  accuracy  with  high  content  validity.  For 
these  reasons,  we  did  not  include  supervisors’  ratings  in  our  analysis. 

We  used  three  types  of  performance  measures — hands-on  proficiency 
tests,  written  proficiency  tests  and  training  course  grades — as  the 
criteria  for  validating  the  ASVAB  in  our  study.  The  hands-on  and 
written  proficiency  tests  were  developed  especially  for  this  study,  but 
the  training  grades  were  obtained  from  Marine  Corps  records .  We  devel¬ 
oped  proficiency  tests  for  three  skills:  Ground  Radio  Repair,  Automo¬ 
tive  Mechanic,  and  Infantry  Rifleman.  In  the  following  section,  we 
discuss  some  issues  in  deciding  on  the  content  of  the  proficiency  tests. 

CONTENT  OF  PROFICIENCY  TESTS 

The  starting  place  for  determining  content  of  the  proficiency  tests 
is,  of  course,  job  requirements.  After  that  general  statement,  diver¬ 
gent  points  of  view  abound  about  how  to  define  job  requirements .  One 
point  of  view  is  that  the  content  should  be  based  on  the  specific 
requirements  in  a  specific  duty  assignment.  In  all  services,  a  person 
is  assigned  to  fill  a  particular  position,  and  from  this  point  of  view, 
content  of  the  proficiency  tests  should  be  based  on  the  requirements  for 
a  particular  individual  in  a  particular  assignment.  A  second  point  of 
view  is  that  the  content  should  enable  generalization  from  the  content 
of  the  measures  to  performance  on  all  requirements  in  the  skill. 

Another  consideration  is  whether  the  content  should  cover  peacetime  or 
combat  requirements.  The  positions  are  not  necessarily  mutually  exclu¬ 
sive  and  there  are  arguments  to  support  each  point  of  view. 

Limit  Content  to  a  Specific  Duty  Assignment? 

In  the  civilian  economy,  a  set  of  job  requirements  usually  defines 
the  responsibility  of  workers,  and  workers  are  evaluated  by  managers 
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according  to  how  well  they  carry  out  their  assigned  responsibilities. 

In  the  military  services,  management  of  workers  ordinarily  is  by  skill 
rather  than  by  specific  duty  assignments,  where  duty  assignments  cor¬ 
respond  to  jobs  in  the  conventional  sense.  Recruits  are  trained  to 
perform  in  a  skill,  which  covers  a  variety  of  duty  assignments.  Service 
personnel  ordinarily  are  eligible  for  assignment  to  any  duty  position 
within  the  skill.  Hence,  the  question  arises  whether  the  content  should 
be  specific  to  the  job  assignment  or  be  representative  of  the  skill. 

If  the  performance  tests  are  to  describe  how  well  workers  are 
performing  at  a  fixed  point  in  time,  say  1  year  after  completion  of 
skill  training,  then  a  reasonable  approach  is  to  define  content  in  terms 
of  specific  job  assignments.  Or,  if  we  want  to  know  how  well  examinees 
are  contributing  to  the  effectiveness  or  readiness  of  their  units,  then 
content  should  be  limited  to  the  assignments.  Another  argument  is  that 
the  best  predictor  of  future  performance  is  present  performance.  At  the 
very  least,  performance  tests  should  be  relevant  to  requirements  in  the 
examinees'  current  assignments. 

Generalize  to  Requirements  in  the  Skill? 

Because  the  purpose  of  the  performance  measures  used  in  this  study 
is  to  serve  as  criteria  for  performance  in  the  skill,  we  must  be  able  to 
generalize  to  requirements  for  the  entire  skill.  The  only  question  is 
how. 


The  safest  and  simplest  way  is  to  develop  the  performance  test  con¬ 
tent  to  facilitate  generalizing  to  the  skill.  All  important  content 
areas  of  the  skill  should  be  included  in  the  tests.  The  hands-on  and 
written  proficiency  tests  used  in  this  study  were  designed  to  represent 
the  critical  requirements  in  the  skill. 

In  addition  to  covering  the  important  content  areas  of  the  entire 
skill,  generalizing  is  facilitated  by  having  all  examinees  respond  to 
the  same  test  content.  The  performances  of  examinees  then  can  be 
compared  directly  with  each  other  because  they  are  on  the  same  score 
scale.  Although  measures  designed  to  cover  the  requirements  for  a 
particular  assignment  may  also  serve  as  measures  for  generalizing  to  the 
skill,  the  measurement  problems  are  momentous.  In  the  final  chapter  we 
discuss  some  problems  of  scaling  measures  that  have  different  content. 

Peacetime  or  Combat  Requirements? 

Although  the  obvious  position  is  to  include  combat  requirements, 
this  solution  generally  is  not  feasible.  For  technical  skills,  such  as 
radio  repair  and  mechanics,  the  tasks  are  similar  in  peacetime  and 
wartime,  and  the  main  difference  is  in  the  working  conditions.  For 
combat  arms  skills,  such  as  rifleman,  job  requirements  are  somewhat 
different,  as  well  as  the  conditions  under  which  tasks  are  performed. 
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If  combat  requirements  are  incorporated  into  the  measures,  then  combat 
conditions  must  be  simulated. 

Some  of  the  combat  requirements  are  that  the  tasks  be  performed  in 
the  vicinity  of  an  intelligent  foe,  i.e.,  one  who  is  able  to  take 
aggressive  action  against  the  examinees.  Also,  the  physical  stress  of 
combat  should  be  built  into  the  measures.  Ordinarily,  the  services  are 
not  willing  to  expose  examinees  to  the  risks  of  realistic  combat  condi¬ 
tions.  Furthermore,  combat  conditions  are  expensive  to  simulate.  From 
a  measurement  point  of  view,  combat  conditions  tend  to  destroy  standard¬ 
ized  administration  and  scoring  procedures.  For  at  least  these  reasons, 
performance  measures  usually  include  peacetime  requirements,  with 
attempts  to  incorporate  combat  conditions  as  feasible. 

How  to  Attain  Content  Validity? 


The  steps  in  constructing  proficiency  tests  involve  a  close  working 
relationship  between  measurement  and  job  experts.  Job  experts  provide 
the  crucial  information  about  job  requirements  ranging  from  the  broad 
content  areas  through  the  tasks  in  each  area  to  the  wording  of  test 
items  or  steps  in  a  task.  Job  experts  should  ensure  that,  to  the  extent 
feasible,  job  requirements  are  realistically  incorporated  into  the 
tests.  Measurement  experts  provide  guidance  about  structuring  the  job 
requirements  into  items  or  steps,  evaluating  the  tests  through  review  by 
panels  of  job  experts,  and  tryout  with  representative  examinees.  Job 
experts  should  play  the  central  role  in  developing  proficiency  tests. 

Job  experts  with  different  points  of  view  should  be  consulted 
during  development  of  the  tests.  One  reason  is  to  ensure  that  all 
critical  content  areas  are  covered  and  in  proper  balance.  They  also 
should  play  a  vital  role  in  ensuring  that  the  details  of  the  tests 
conform  to  job  requirements.  One  consideration  is  that  the  language  and 
concepts  of  written  tests  conform  to  ordinary  usage  of  workers  in  the 
skill.  Another  is  that  the  tasks  of  hands-on  performance  tests  are 
structured  similarly  to  the  way  they  are  typically  performed.  If  the 
details  deviate  from  job  requirements,  content  validity  is  lowered. 

A  fault  that  occurs  frequently  in  proficiency  tests  is  that 
examinees  are  asked  to  tell  what  they  know  about  the  job  rather  than  to 
demonstrate  that  they  know  how  to  perform  tasks.  Written  tests  espe¬ 
cially  can  focus  on  abstract  facts  and  principles,  rather  than  requiring 
examinees  to  apply  their  knowledge  in  practical  situations.  A  good 
strategy  is  to  present  a  work  situation  and  ask  the  examinees  what  they 
would  do  if  they  encountered  specific  conditions.  Even  hands-on  perfor¬ 
mance  tests  can  err  by  focusing  on  trivial  tasks,  and  they  may  reflect 
procedures  different  from  those  workers  typically  use.  For  example,  the 
training  course  may  teach  one  set  of  procedures  for  performing  a  task, 
but  in  the  field  environment  different  procedures  may  be  used.  The  test 
should  be  based  on  the  procedures  used  in  the  field  rather  than  those 
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taught  in  the  classroom.  Job  experts  should  be  alerted  to  these  prob¬ 
lems,  and  the  content  should  be  reviewed  at  all  levels  of  detail. 

In  this  study,  the  purpose  of  the  performance  evaluations  is  to 
serve  as  criterion  measures  for  validating  the  ASVAB.  As  criterion 
measures,  the  evaluation  scores  must  have  measurement  accuracy.  To  gain 
accuracy — standardized  testing  conditions  and  reproducibility  of  the 
scores — we  must  sacrifice  some  content  validity.  Some  realism  of  job 
requirements,  such  as  an  unstructured  working  environment,  must  be 
lost.  For  other  purposes,  such  as  identifying  training  deficiencies, 
content  validity  is  more  important,  and  some  scoring  accuracy  may  be 
sacrificed  to  achieve  greater  realism.  The  content  of  the  performance 
tests  used  in  this  study  is  a  reasonable  compromise  that  balances 
realistic  job  requirements,  working  conditions,  and  scoring  accuracy. 

RESEARCH  DESIGN 

The  research  design  was  intended  to  provide  data  on  the  predictive 
validity  of  the  ASVAB  across  a  broad  range  of  job  requirements.  We 
evaluated  the  effects  of  combining  the  performance  measures  in  different 
ways  to  determine  whether  the  use  of  alternative  criterion  measures 
would  result  in  selecting  the  same  or  different  people.  The  analyses  in 
this  report  were  directed  toward  the  establishment  of  qualifying  prereq¬ 
uisite  aptitude  composite  scores  in  each  skill.  Subsequent  analyses 
will  address  the  question  of  how  to  combine  the  information  for  each 
skill  and  establish  enlistment  standards. 

Skills  Used  in  the  Study 


The  skills  used  in  the  study  range  from  high  to  low  in  their  tech¬ 
nical  complexity.  The  most  technically  demanding  skill  is  Ground  Radio 
Repair.  Radio  repairers  in  the  Marine  Corps  perform  many  trouble¬ 
shooting  tasks.  Troubleshooting  requires  first,  knowing  how  the  equip¬ 
ment  functions,  second,  applying  the  knowledge  to  diagnose  malfunctions, 
and  third,  taking  appropriate  corrective  action.  The  skills  and 
knowledge  are  primarily  mental,  or  cognitive;  psychomotor  skills  and 
manual  dexterity  are  also  required  to  use  hand  tools,  such  as  soldering 
in  tight  spaces.  The  formal  school  training  for  the  Ground  Radio  Repair 
skill  lasts  about  37  weeks. 

The  skill  with  intermediate  technical  demands  is  the  organizational 
level  Automotive  Mechanic.  The  organizational  level  mechanic  tends  to 
perform  the  more  routine  tasks,  such  as  engine  tune— up  and  removing  and 
replacing  parts.  Complex  repair  tasks,  such  as  overhaul  of  the  trans¬ 
missions,  are  performed  at  higher  levels  of  maintenance.  Automotive 
mechanics  must  have  some  knowledge  of  how  the  various  systems  of  a 
vehicle  function;  they  must  also  be  proficient  in  the  use  of  tools  and 
equipment.  Mechanics  tend  to  have  a  balance  of  cognitive  and  psycho- 
motor  demands  placed  on  them.  Their  formal  school  training  lasts  13 
weeks . 
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Of  the  three  skills,  Infantry  Rifleman  places  the  least  technical 
demands  on  incumbents.  Riflemen  must  have  physical  stamina  and  enough 
strength  to  carry  heavy  loads.  They  must  have  psychomotor  skills,  such 
as  marksmanship  and  first  aid.  Although  they  must  have  some  cognitive 
skills,  as  in  land  navigation,  first  aid  and  communication  signals,  most 
of  the  technical  complexity  is  handled  by  squad  leaders  and  higher 
ranking  personnel.  The  formal  school  training  for  Riflemen  lasts 
5  weeks . 

The  hands-on  and  written  proficiency  tests  for  these  skills  take 
different  forms;  the  content  is  described  next.  The  three  skills  also 
may  differ  in  the  degree  to  which  performance  is  predictable  by  the 
ASVAB .  The  ASVAB  consists  largely  of  items  that  tap  cognitive  skills 
(vocabulary,  arithmetic  and  mathematical  problems,  and  knowledge  of 
technical  fields)  .  Our  expectation  is  that  performance  in  the  Infantry 
Rifleman  skill  is  least  predictable  by  the  ASVAB,  and  the  Ground  Radio 
Repair  is  most  predictable. 

Description  of  the  Proficiency  Tests 


Hands-on  and  written  proficiency  tests  were  developed  for  each 
skill.  The  tests  were  developed  by  Marine  Corps  job  experts,  with 
technical  assistance  from  the  Navy  Personnel  Research  and  Development 
Center  (NPRDC) .  The  test  development  is  described  in  an  NPRDC  report 
[1],  and  details  are  presented  in  appendix  A. 

The  tests  are  obtrusive  measures  of  proficiency  in  the  sense  that 
the  examinees  knew  they  were  being  tested.  They  were  not  informed 
beforehand,  however,  about  the  content  of  the  test.  The  examinees  were 
instructed  to  refrain  from  discussing  the  test  content  with  other 
Marines  who  would  be  tested  later.  The  tests,  therefore,  are  intended 
to  reflect  the  skills  and  knowledge  of  the  examinees  under  standard 
testing  conditions,  rather  than  their  level  of  performance  in  the  normal 
work  environment . 

Ground  Radio  Repair  Proficiency  Test 


The  written  portion  of  the  radio  repair  test  consisted  of  59  items 
and  required  2  hours  of  testing  time.  The  written  test  had  four 
sections : 

•  General  topics,  22  items — use  and  function  of  equipment 
and  calculating  quantities  for  simple  circuits 

•  Meters,  18  items — use,  function,  and  setting  up 

•  Oscilloscopes,  12  items — use  and  function 
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•  AN/UIQ-10,  7  items — troubleshoot  an  unfamiliar  piece  of 
equipment  from  description  of  symptoms,  using  technical 
manuals  and  troubleshooting  charts. 

The  testing  time  was  75  minutes  for  the  first  52  items  and  50  minutes 
for  the  last  7  items,  which  required  extensive  looking  up  of  information 
about  the  equipment. 

The  hands-on  test  consisted  of  troubleshooting  10  circuit  boards. 

A  total  of  210  minutes,  with  up  to  30  minutes  for  each  board,  was 
allowed  for  the  hands-on  test.  Some  examinees  were  not  able  to  work  on 
all  boards  because  of  the  total  time  limit.  For  each  board,  the 
examinees  were  to  identify  the  faulty  symptom  (worth  2  points),  circuit 
(4  points),  and  component  (up  to  8  points).  Examinees  were  encouraged 
to  guess  when  they  had  narrowed  the  choice  of  circuits  and  components. 


The  hands-on  test  involved  the  use  of  meters,  signal  generator,  and 
oscilloscope  to  troubleshoot  the  circuit  boards.  The  examinees  could 
use  the  technical  manuals  and  troubleshooting  charts  for  the  equipment. 
None  of  the  examinees  had  ever  worked  on  this  piece  of  equipment  before. 
The  test  therefore  tapped  their  ability  to  apply  their  skills  and 
knowledge  in  a  novel  situation. 

Automotive  Mechanic  Proficiency  Test 

The  written  portion  of  the  automotive  mechanic  test  required 
2  hours  of  testing  time  and  consisted  of  61  items.  The  first  23  items 
covered  the  following  systems,  with  special  reference  to  the  M151 
quarter-ton  vehicle  (Jeep): 

•  Fuel  and  electrical — 12  items 

•  Steering — 3  items 

•  Cooling — 8  items. 

The  remaining  38  items  covered  the  M54  5-ton  multifuel  vehicle. 

Examinees  could  consult  technical  manuals  during  the  test. 

The  hands-on  test  consisted  of  four  tasks  on  the  M151  vehicle  and 
required  up  to  3-1/2  hours: 

•  Major  engine  tuneup — 2  hours 

•  Alternator  output  and  battery — 30  minutes 

•  Wheel  and  brake  maintenance — 60  minutes 

•  Equipment  repair  order — embedded  in  the  other  tests. 
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Each  part  of  the  hands-on  test  consisted  of  steps,  with  each  step  scored 
pass  or  fail.  The  score  is  the  number  of  steps  passed  for  each  part. 

The  test  administrators  provided  prompts  to  examinees  when  they  were 
stuck  on  a  step.  Administrators  exercised  their  own  judgment  about 
scoring  each  step  as  pass  or  fail.  No  systematic  instructions  were 
provided  about  the  amount  of  assistance  to  provide,  how  to  score  the 
step  when  prompts  were  provided,  or  how  to  record  the  fact  that  prompts 
were  given.  Scores  on  the  hands-on  test  therefore  may  vary  because  of 
the  varying  amounts  of  help  given  by  the  administrators. 

Infantry  Rifleman  Proficiency  Test 


The  written  test  for  the  infantrymen  had  100  points  and  required 
30  minutes  to  administer.  It  covered  the  following  topics: 

•  Infantryman  weapons  and  duties — 11  points 

•  Weapon  characteristics — 17  points 

•  Combat  intelligence  and  prisoner  handling — 29  points 

•  Acronyms — 24  points 

•  NBC  defense — 13  points 

•  Identification  of  tracked  vehicles  and  aircraft — 6  points. 

The  number  of  items  does  not  correspond  to  the  number  of  points  because 
complex  weighting  schemes  were  used  to  assign  points. 

The  hands-on  test  had  seven  tasks,  worth  a  total  of  332  points,  and 
required  about  4  hours.  The  tasks  and  points  for  each  are: 

•  Map  and  compass — 85  points 

•  First  aid — 43  points 

•  Fire  team  formations — 27  points 

•  Mines  and  booby  traps — 67  points 

•  Target  engagement — 110  points. 

The  number  of  points  includes  negative  scores  for  serious  errors,  such 
as  firing  on  friendly  targets  or  inability  to  tell  which  direction  is 
north  by  reading  a  compass . 


-12- 


Background  and  Job  Experience  of  Examinees 


Examinees  completed  a  brief  questionnaire  about  the  amount  and  type 
of  their  training  and  job  experience.  Examinees  in  all  three  skills 
were  asked  how  many  months  they  had  of  relevant  job  experience  since 
completing  formal  training.  The  automotive  mechanics  were  also  asked 
about  the  amount  of  their  paid  civilian  experience  as  mechanics. 

Data  Collection  Procedures 


All  proficiency  tests  were  administered  at  Camp  Pendleton,  CA.  The 
test  administrators  were  senior  Marine  Corps  enlisted  personnel  with  job 
experience  in  their  field.  Most  examinees  were  stationed  at  Camp 
Pendleton,  but  radio  repairers  and  automotive  mechanics  were  also 
obtained  from  other  Marine  Corps  locations  in  Southern  California.  All 
testing  for  each  examinee  was  accomplished  in  1  day.  The  test  adminis¬ 
trations  were  conducted  from  August  through  November  1981. 

Proficiency  Tests 


All  parts  of  the  hands-on  tests  in  the  Radio  Repair  and  Mechanic 
skills  for  any  one  examinee  were  administered  and  scored  by  the  same 
administrator.  Whatever  effect  an  administrator  had  on  the  test  scores, 
such  as  giving  prompts  about  the  correct  action  to  take,  was  the  same 
for  all  parts  of  the  hands-on  test  for  each  examinee. 

For  the  Infantry  Rifleman  skill,  the  test  administration  procedures 
were  different.  The  hands-on  test  for  the  infantryman  was  divided  into 
a  series  of  testing  stations.  Each  station  as  a  rule  was  handled  by  a 
different  test  administrator.  On  occasion,  the  same  administrator 
handled  several  stations  for  some  examinees.  The  administrators  some¬ 
times  also  moved  to  different  testing  stations  on  different  days.  The 
effects  of  test  administrators  on  the  scores  for  infantrymen  did  not 
consistently  raise  or  lower  the  total  hands-on  scores  for  any  one 
examinee . 

For  the  Radio  Repair  and  Mechanics  skills,  the  effects  of  test 
administrator  on  the  hands-on  scores  can  be  computed;  but  for  the 
Infantryman  skill,  the  complex  testing  arrangements  preclude  computing 
the  effects  of  test  administrators  on  the  hands-on  scores. 

As  we  mentioned  when  discussing  hands-on  proficiency  tests,  the 
scoring  of  hands-on  tests  requires  expert  judgment.  Experts  tend  to 
disagree  about  scoring  standards  and  about  how  to  handle  the  examinees, 
including  the  amount  and  type  of  prompting.  These  differences  among 
administrators  introduce  unwanted  variation  into  examinees’  scores. 
Ideally,  the  scores  should  reflect  the  competence  of  examinees  and 
nothing  else.  To  the  extent  that  some  examinees1  scores  are  raised  or 
lowered  because  of  the  administrator,  the  test  scores  contain  error. 

The  differences  among  administrators  should  be  statistically  removed 
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from  hands-on  scores  prior  to  computing  their  relationship  to  other 
scores . 

Another  possible  source  of  error  arises  from  poor  test  security. 
Because  only  a  few  examinees  can  be  tested  at  a  time,  and  administra¬ 
tions  are  spread  over  weeks  or  months,  examinees  tested  later  in  the 
schedule  may  have  had  an  opportunity  to  practice  performing  the  tasks  in 
the  hands-on  test  and  learn  the  answers  to  the  written  test  items.  To 
minimize  leaks  about  test  content,  examinees  were  admonished  to  refrain 
from  discussing  the  test  with  other  potential  examinees.  To  the  extent 
that  some  examinees  had  prior  knowledge  and  others  did  not,  differences 
in  scores  will  be  misleading. 

Written  proficiency  tests  were  administered  on  the  same  days  as  the 
hands-on  tests.  Scoring  of  both  the  hands-on  and  written  tests  was  done 
centrally  by  CNA  rather  than  locally  by  test  administrators. 

Training  Grades 


Following  completion  of  the  testing,  we  attempted  to  collect 
training  grades  from  the  Marine  Corps  schools  where  the  examinees 
received  their  skill  training.  Many  of  the  examinees  had  graduated  from 
their  skill  training  courses  several  years  earlier,  and  the  schools  no 
longer  retained  the  records.  As  a  result,  the  samples  were  reduced 
because  of  missing  grades. 

ASVAB  Test  Scores 


We  also  collected  ASVAB  test  scores  for  the  examinees  following  the 
completion  of  testing.  Again,  we  lost  cases  because  some  examinees  were 
enlisted  before  ASVAB  5/6/7  was  introduced  (January  1976).  We  lost  more 
cases  because  of  incomplete  information  about  ASVAB  scores.  Some 
examinees  were  missing  one  or  more  subtest  scores,  and  we  deleted  them 
from  the  analysis. 

Most  of  the  cases  lost  were  because  of  missing  training  grades. 

Had  we  been  able  to  maintain  a  large  sample  size  by  obtaining  more  ASVAB 
scores,  we  could  have  retrieved  more  of  these  data.  But  because  of  the 
large  number  with  missing  training  grades,  we  decided  not  to  engage  in 
an  expensive  clerical  search  for  more  complete  records  of  ASVAB  scores. 

Statistical  Analysis 


Each  skill  was  analyzed  separately.  The  first  objective  was  to 
establish  the  construct  validity  of  the  performance  measures.  For  our 
purposes,  the  construct  validity  of  the  performance  measures  is  estab¬ 
lished  when  we  determine  that  all  the  measures  are  consistent  indicators 
of  performance.  In  the  analysis,  we  start  with  the  relationships  among 
the  units  of  tests  that  are  scored  separately.  These  units  are  the 
parts  of  the  hands-on  and  written  tests  described  earlier  in  the 
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section,  Description  of  the  Proficiency  Tests.  The  correlation  among 
the  parts  should  be  positive  because  each  part  is  designed  to  measure  a 
cluster  of  job  requirements  in  the  skill.  A  negative  or  zero  correla¬ 
tion  among  the  parts  implies  either  that  the  content  of  that  requirement 
is  different  from  the  other  job  requirements,  or  more  likely,  that  the 
measurement  properties  of  that  part  are  faulty.  In  addition,  we  esti¬ 
mated  the  internal  consistency  reliability  of  the  proficiency  tests. 

The  internal  consistency  reliability  reflects  the  intercorrelation  of 
the  parts  of  items.  We  then  proceeded  to  combine  the  parts  to  form 
larger  units.  Most  of  our  analyses  used  the  hands-on  test  score, 
written  test  score,  and  training  grade  for  each  examinee  as  the  units  of 
analysis.  The  intercorrelation  among  these  three  measures  should  be 
high.  If  the  correlations  are  high,  then  they  are  measuring  something 
in  common.  We  infer  that  the  common  dimension  among  the  measures  is  job 
performance.  The  construct  validity  of  the  measures  is  also  supported 
by  their  correlation  with  relevant  job  experience  and  enlisted  grade. 

As  a  rule,  people  with  more  experience  and  at  higher  enlisted  grades 
should  be  more  proficient  in  their  jobs.  More  details  about  the  statis¬ 
tical  analysis  are  presented  in  appendix  B. 

In  addition,  the  correlation  between  the  performance  measures  and 
the  ASVAB  aptitude  composites  should  conform  to  our  a  priori  expecta¬ 
tions.  The  predictive  validity  of  the  ASVAB  aptitude  composites  is 
supported  by  more  than  40  years  of  research  and  experience.  If  the 
performance  measures  are  indeed  measuring  job  performance,  the  Elec¬ 
tronics  Repair  (EL)  aptitude  composite  should  have  the  highest  validity 
of  all  the  ASVAB  composites  for  predicting  performance  in  the  Ground 
Radio  Repair  skill.  Similarly,  the  Mechanical  Maintenance  (MM)  com¬ 
posite  should  have  the  highest  predictive  validity  in  the  Automotive 
Mechanic  skill,  and  Combat  (CO)  in  the  Infantry  Rifleman  skill.  If 
other  aptitude  composites  have  higher  predictive  validity,  we  suspect 
that  the  performance  measure  may,  in  fact,  be  measuring  something  other 
than  job  performance. 

The  analysis  was  directed  toward  examining  patterns  of  relation¬ 
ships  among  the  variables.  No  single  statistic  provides  sufficient 
evidence  to  confirm  or  deny  that  a  measure  has  adequate  construct 
validity.  If  the  pattern  of  relationships  is  consistent  and  conforms  to 
our  prior  expectations,  then  we  are  more  confident  in  inferring  that  the 
common  dimension  running  through  the  measures  is  job  performance. 

The  key  to  establishing  the  construct  validity  of  the  performance 
measures  lies  in  the  way  they  were  constructed  in  the  first  place.  Job 
experts  must  agree  that  the  content  of  the  measure  is  based  on  job 
requirements.  Expert  judgment  establishes  the  job  relevance  of  the 
measures.  The  statistical  analysis  cannot  change  the  content;  it  can 
support  that  the  measures  are  behaving  as  expected  or  that  somehow  the 
measures  contain  unsatisfactory  degrees  of  error. 
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OVERVIEW 


This  introduction  to  job  performance  measures  has  been  lengthy. 

One  reason  is  that  research  efforts  using  hands-on  performance  tests  as 
the  criterion  measures  are  done  infrequently.  Little  work  has  been  done 
to  validate  the  ASVAB,  or  any  other  aptitude  battery,  as  a  predictor  of 
objective  measures  of  job  performance.  The  measures  and  procedures 
needed  to  be  described  in  more  than  the  usual  detail.  A  second  reason 
is  that  developing  and  administering  performance  measures  is  a  complex 
process.  Many  things  can  go  wrong,  and  we  have  covered  only  the  most 
salient  sources  of  errors.  The  need  for  good  performance  measures  has 
long  been  recognized  by  the  personnel  research  community.  The  paucity 
of  prior  research  is  not  an  indication  of  lack  of  will  or  foresight; 
rather  it  attests  to  the  complexity  and  expense  of  obtaining  good 
measures  of  performance  that  have  high  manifest  content  validity.  The 
Department  of  Defense  is  taking  a  bold  step  by  requesting  the  armed 
services  to  validate  their  enlistment  standards  against  job  performance. 

In  chapter  2  we  present  evidence  supporting  the  construct  validity 
of  the  three  performance  measures.  In  chapter  3,  we  then  use  the  per¬ 
formance  measures  to  establish  minimum  prerequisite  scores  for  assign¬ 
ment  to  these  skills. 
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CHAPTER  2 


EVALUATION  OF  PERFORMANCE  MEASURES 


INTRODUCTION 

The  purpose  of  evaluating  the  performance  measures  is  to  determine 
their  credibility.  Because  each  performance  measure  is  a  sample  of  job 
requirements  in  the  skill,  and  because  we  need  to  generalize  from  per¬ 
formance  on  each  measure  to  performance  in  the  skill,  each  measure 
should  be  a  consistent  indicator  of  performance  in  the  skill.  We  deter¬ 
mine  consistency  by  computing  the  intercorrelation  among  the  measures. 

To  the  extent  that  the  measures  are  positively  intercorrelated,  they  are 
measuring  the  same  thing,  and  they  are  consistent.  We  improve  our  esti¬ 
mate  of  performance  in  the  skill  by  adding  together  measures  that  are 
consistently  related  to  other  measures.  Each  measure  must,  of  course, 
first  have  content  validity  as  judged  by  job  experts.  The  statistical 
analysis  examines  their  relationships  to  determine  how  well  we  can 
generalize  from  them  to  performance  in  the  skill. 

The  first  step  in  the  analysis  is  to  compute  the  correlation  among 
the  parts  of  hands-on  and  written  proficiency  tests.  Job  experts  deter¬ 
mined  from  a  content  point  of  view  that  each  part  is  a  component  of 
performance  in  the  skill.  If  a  part  is  negatively  related  to  other 
parts  of  the  test,  then  the  scores  on  that  part  are  not  consistent.  In 
performance  measurement,  the  direction  of  the  correlation  can  be  speci¬ 
fied  beforehand  because  job  experts  specify  whether  a  high  or  low  score 
is  indicative  of  high  performance.  If  after  inspection  the  part  appears 
faulty,  it  should  be  deleted  from  the  test.  We  also  computed  the 
internal  consistency  to  determine  the  extent  to  which  test  items  or 
steps  in  the  hands-on  test  are  consistent  indicators  of  performance  in 
the  skill.  A  high  internal  consistency  index  indicates  that  the  mea¬ 
sures  are  consistent  evaluations  of  performance. 

The  second  step  is  to  examine  the  consistency  among  the  performance 
measures  and  other  indicators  of  performance.  Again  we  look  for  posi¬ 
tive  correlation  among  the  performance  measures  and  with  other  indica¬ 
tors.  The  other  indicators  are  enlisted  grade  and  amount  of  job  experi¬ 
ence.  People  who  have  more  experience  working  in  the  skill  and  whom  the 
Marine  Corps  has  rewarded  by  promotion  in  the  skill  should  tend  to  have 
higher  performance  scores.  Also,  the  pattern  of  correlation  with  ASVAB 
aptitude  composites  should  conform  to  our  expectations  about  the  predic¬ 
tive  validity  of  the  composites.  If  the  correlations  do  not  conform  to 
these  expectations,  we  would  suspect  that  the  performance  measures  are 
measuring  extraneous  factors  in  addition  to  job  competence. 

The  correlation  coefficients  for  the  samples  of  examinees  in  each 
skill  underestimate  the  values  for  the  full  population  of  potential 
recruits.  The  examinees  have  undergone  a  double  selection.  First,  they 
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had  to  attain  qualifying  scores  on  the  ASVAB  at  the  time  of  enlistment 
and  assignment  to  training  in  the  skill-  Second,  they  had  to  pass  the 
training  course  before  they  were  allowed  to  work  in  the  skill.  The 
correlation  coefficients  among  the  performance  measures  and  with  other 
indicators  of  performance  will  be  corrected  for  prior  selection  of 
examinees  on  the  basis  of  their  ASVAB  scores.  To  the  extent  that  ASVAB 
scores  predict  grades  in  skill  training  courses,  the  correction  for 
selection  on  ASVAB  scores  also  corrects  for  failure  to  complete  the 
skill  training  course.  The  correction  estimates  the  correlation  that 
would  obtain  in  the  population  of  potential  recruits. 

INTERCORRELATION  OF  PROFICIENCY  TEST  SCORES 

The  intercorrelations  presented  in  this  section  are  based  on  the 
examinees  who  were  administered  the  proficiency  tests.  The  original 
intent  was  that  the  examinees  would  be  stationed  at  Camp  Pendleton  and 
have  from  6  to  18  months  of  job  experience.  Because  there  were  not 
enough  people  assigned  to  the  Ground  Radio  Repair  or  Automotive  Mechanic 
skills  at  Camp  Pendleton,  Marines  from  other  sites  in  southern 
California  were  brought  there  for  testing.  Only  Marines  assigned  to 
these  skills  serving  in  their  first  enlistment  or  early  in  their  second 
enlistment  were  given  the  proficiency  tests.  All  examinees  in  the 
Infantry  Rifleman  skill  were  stationed  at  Camp  Pendleton.  Their  job 
experience  ranged  from  about  2  weeks  in  a  unit  to  over  2  years. 

Hands-on  Tests 

Ground  Radio  Repair 


The  hands-on  test  for  the  Ground  Radio  Repair  skill  consisted  of 
ten  defective  circuit  boards.  The  score  for  each  board  ranged  from  0  to 
14  (2  points  for  identifying  the  faulty  symptom,  4  points  for  the  faulty 
circuit,  and  8  for  the  faulty  component).  In  table  1  we  show  the  inter¬ 
correlation  of  the  ten  circuit  boards,  the  correlation  of  each  board 
with  the  total  hands-on  score,  and  the  intercorrelation  of  the  symptom, 
circuit,  and  component  scores,  where  each  score  is  summed  across  the 
ten  boards.  The  correlation  coefficients  of  the  parts  were  not  cor¬ 
rected  for  selection  of  the  examinees.  All  correlation  coefficients  are 
positive,  which  indicates  that  each  board  is  a  consistent  indicator  of 
performance.  The  intercorrelation  of  the  symptom,  circuit,  and  com¬ 
ponent  scores  is  also  positive.  The  magnitude  of  the  coefficients  is  of 
less  importance  than  their  pattern.  All  coefficients  should  be  posi¬ 
tive;  if  negative,  the  value  should  be  small.  These  results  support  the 
credibility  of  the  hands-on  test  as  a  measure  of  proficiency. 

The  coefficients  in  table  1  are  based  on  154  Marines  assigned  to 
the  Ground  Radio  Repair  skill.  The  original  sample  consisted  of 
189  examinees,  but  35  cases  were  deleted  because  their  training  and  job 
experience  were  different  from  those  of  the  154  cases.  The  35  cases  are 
described  in  appendix  B. 
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INTERCORRELATION  OF  HANDS-ON  TEST-GROUND  RADIO  REPAIR' 
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of  the  ten  circuit  boards 


Automotive  Mechanic 


The  hands-on  test  for  the  Automotive  Mechanic  sample  consisted  of 
ten  parts.  Their  intercorrelation  is  shown  in  table  2.  The  sample  con¬ 
sisted  of  263  automotive  mechanics.  The  correlation  coefficients  are 
positive,  except  for  the  plugs  part.  The  plugs  part  was  retained 
because  the  coefficents  varied  around  zero,  rather  than  a  large  negative 
value.  The  zero  or  small  negative  correlation  is  acceptable  for  parts 
of  a  test,  but  would  be  troublesome  if  found  between  the  total  scores, 
or  between  the  hands-on  and  written  tests  .  Given  that  the  job  experts 
had  determined  that  automotive  mechanics  should  know  how  to  check  spark 
plugs,  we  decided  that  the  zero  correlations  did  not  warrant  throwing 
out  this  part.  The  pattern  of  generally  positive  intercorrelation 
supports  the  credibility  of  this  hands-on  test  as  a  measure  of 
proficiency . 

Infantry  Rifleman 


The  intercorrelation  of  the  hands-on  test  parts  for  the  Infantry 
Rifleman  skill  is  shown  in  table  3*  The  hands-on  test  has  five  parts, 
and  all  five  are  positively  intercorrelated .  Again,  we  conclude  that 
the  credibility  of  the  hands-on  test  as  a  measure  of  job  proficiency  is 
supported  . 

Each  of  the  three  hands-on  proficiency  tests  appears  to  be  mea¬ 
suring  something  in  common.  The  content  validity  of  the  hands-on  tests, 
supported  by  the  positive  intercorrelation  of  the  parts,  suggests  that 
we  can  use  the  hands-on  tests  to  help  evaluate  the  extent  to  which  the 
other  performance  measures  are  in  fact  relevant  to  job  requirements . 

Internal  Consistency  of  the  Hands-on  Tests 


The  internal  consistency  reliability  index  is  a  function  of  the 
intercorrelation  among  the  parts  or  test  items.  To  the  extent  that  the 
parts  or  items  are  measuring  the  same  thing,  they  are  intercorrelated 
and  the  internal  consistency  index  is  high.  We  used  special  equations 
to  compute  the  internal  consistency  of  a  composite  [2]  .  The  equation 
for  the  a  index  is  the  lower  bound  in  that  it  does  not  consider  the 
reliability  of  the  parts  .  It  is  analogous  to  the  conventional  equation 
for  computing  the  internal  consistency  of  a  test: 


a 


n 

n-1 


where : 


a  =  the  internal  consistency  reliability,  ranging  from  0  to  1.0 
n  =  the  number  of  parts  or  items  in  the  test 
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2 

Sy  =  the  variance  of  each  part  or  item 
2 

Srp  =  the  variance  of  the  total  test. 


TABLE  3 

INTERCORRELATION  OF  HANDS-ON  TEST— 
INFANTRY  RIFLEMAN3 


Part 

Part  of 

1  2 

hands- 

3 

•on  test 

4  5 

Mean 

score 

Standard 

deviation 

1 

Map  and  compass 

_ 

.24 

.42 

.43 

.15 

34.2 

15.2 

2 

First  aid 

.24 

- 

.17 

.18 

.06 

16.5 

5.7 

3 

Formations 

.42 

.17 

- 

.16 

.06 

14.5 

5.6 

4 

Antitank 

.43 

.18 

.16 

- 

.24 

24.4 

10.0 

5 

Firing 

.15 

.06 

.06 

.24 

— 

49.0 

22.0 

Total 

.71 

.36 

.42 

.63 

.73 

138.6 

37.6 

aNumber  of  cases  is  384. 


The  second  equation  takes  into  account  the  reliability  of  the  parts: 


r 


tt 


where : 


£S2  r  +  ?  Cov 
y  yy  xr y _ xy 

Y  2  Y 

^S  +  Cov 

y  xfy  xy 


r 


r 

Cov 


tt 

S2 

y 

yy 

xy 


the  internal  consistency  of  the  composite 
the  variance  of  part  y 
the  reliability  of  part  y 
the  covariance  of  parts  x  and  y. 
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The  estimated  internal  consistency  of  the  hands-on  tests  using  the  two 
equations  is 

_ Skill _  Internal  consistency 


a 

rtt 

Ground  Radio  Repair 

.81 

.86 

Automotive  Mechanic 

.60 

.77 

Rifleman 

.47 

.69 

For  computing  rtt,  we  assumed  the  reliability  of  each  part  to  be  .50, 
which  is  a  conservative  value.  The  variance  of  the  parts  and  total 
scores  used  in  the  computations  are  shown  in  appendix  C. 

The  internal  consistency  indexes  indicate  that  troubleshooting  the 
ten  circuit  boards  in  the  Ground  Radio  Repair  skill  is  a  relatively 
homogeneous  task,  and  the  two  indexes  have  similar  values  ( .81  versus 
.86)  .  The  nine  maintenance  tasks  in  the  Automotive  Mechanic  hands-on 
test  are  more  heterogeneous,  with  values  of  .60  and  .77,  and  the  parts 
of  the  Rifleman  test  are  the  most  heterogeneous  (  .47  and  .69)  .  By 
including  estimates  of  the  reliability  of  the  parts  for  the  Automotive 
Mechanic  and  Rifleman  hands-on  tests,  the  internal  consistency  indexes 
were  increased  appreciably.  The  relative  magnitude  of  the  internal 
consistency  indexes  conforms  to  the  intercorrelations  among  the  parts 
shown  in  tables  1,  2,  and  3. 

Written  Tests 


Ground  Radio  Repair 


The  intercorrelation  of  the  parts  of  the  written  proficiency  test 
for  the  Ground  Radio  Repair  skill  is  shown  in  table  4.  The  four  parts 
are  positively  intercorrelated .  The  internal  consistency  reliability  of 
the  written  test  is  .83,  when  the  test  items  are  used  as  a  unit.  When 
the  parts  of  the  written  test  are  used  to  estimate  internal  consistency, 
the  a  index  is  .54  and  the  r^t  index  (assuming  each  part  has  a  relia¬ 
bility  of  .50)  is  .70.  The  intercorrelation  and  reliability  indicate 
that  the  scores  of  the  written  test  are  accurate  measures  of  whatever  it 
is  that  the  test  is  measuring.  In  other  words,  we  expect  that  the 
examinees  would  reproduce  their  scores  reasonably  well  if  tested  again 
with  a  different  set  of  test  items.  We  conclude  that  the  credibility  of 
the  written  test  as  a  measure  of  job  proficiency  is  supported.  However, 
because  of  its  questionable  content  validity,  we  need  to  find  how  it 
correlates  with  the  hands-on  proficiency  test  and  the  other  measures  of 
performance  before  we  can  be  more  confident  that  it  is  in  fact  measuring 
job  proficiency. 
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TABLE  4 


INTERCORRELATION  OF  WRITTEN  TEST- 
GROUND  RADIO  REPAIR3 

Part  of  hands-on  test 

Mean  Standard 


Part  . 

i 

2 

3 

4 

score 

deviati< 

1 

General 

_ 

.30 

.23 

.31 

15.4 

3.1 

2 

Meters 

.30 

- 

.24 

.15 

19.4 

4.4 

3 

Scopes 

.23 

.24 

- 

vO 

CM 

11.6 

5.7 

4 

Troubleshooting 

.31 

.15 

.26 

— 

9.0 

4.3 

Total 

.56 

.67 

.75 

.63 

55  .4 

11.1 

aNumber  of  cases  is  154, 


Automotive  Mechanic 

The  intercorrelation  of  the  written  parts  for  the  Automotive 
Mechanic  skill  is  shown  in  table  5.  The  items  on  the  fuel  and  cooling 
systems  and  the  5-ton  multifuel  vehicle  have  a  pattern  of  positive 
intercorrelation,  but  the  items  on  the  steering  system  have  a  low  corre¬ 
lation  with  the  other  three  parts.  In  general,  the  parts  of  the  written 
test  for  the  Automotive  Mechanic  skill  are  consistent  measures.  The 
internal  consistency  reliability  of  the  test  is  .77  when  the  items  are 
used  as  the  unit.  When  the  parts  are  used  as  the  unit,  the  a  index  is 
.36,  and  the  rtt  index  (assuming  the  reliability  of  each  part  is  .50)  is 
.64.  The  written  test  for  the  Automotive  Mechanics  skill  has  sufficient 
credibility  to  warrant  further  analysis  as  a  measure  of  performance. 

Infantry  Rifleman 


The  written  test  for  the  Infantry  Rifleman  skill  consisted  of 
nine  parts  .  The  nine  parts  have  a  pattern  of  positive  intercorrelation 
(table  6).  Only  one  part,  handling  of  prisoners-1,  has  correlation 
coefficients  close  to  zero .  We  did  not  attempt  to  compute  the  internal 
consistency  reliability  using  items  as  the  unit,  because  the  complex 
scoring  rules  preclude  using  conventional  formulas  for  computing  test 
reliability.  Using  the  part  of  the  test  as  the  unit,  the  a  index  is  .66, 
and  the  rtt  index  (assuming  reliability  of  .50  for  each  part)  is  .77. 

The  pattern  of  positive  intercorrelation  and  the  internal  consistency 
indexes  show  that  the  parts  are  measuring  something  in  common,  and  we 
can  use  the  written  test  in  the  subsequent  analysis . 
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TABLE  5 


INTERCORRELATION  OF  WRITTEN  TEST- 
AUTOMOTIVE  MECHANIC3 


Part  of  written  test 


Part 

Fuel 

Steering 

Cooling 

5-ton 

Mean 

score 

Standard 

deviation 

Fuel 

_ 

.13 

.20 

.31 

6.0 

2.0 

Steering 

.13 

- 

.01 

-.01 

1.8 

0.8 

Cooling 

.20 

.01 

- 

.40 

5.6 

1.4 

5-ton  multifuel 

.31 

T-* 

o 

1 

.40 

— 

20.0 

5.8 

Total 

.56 

.14 

.55 

.94 

33.4 

7.4 

aNumber  of  cases  is  263. 


The  hands-on  and  written  proficiency  tests  for  all  three  skills 
have  passed  the  first  analysis  to  determine  their  credibility.  The 
parts  of  each  test  have  a  satisfactory  pattern  of  positive  intercorrela¬ 
tion  and  satisfactory  internal  consistency  reliability.  In  this  first 
step  of  the  analysis,  we  were  looking  for  large  negative  coefficients 
that  would  point  to  faulty  measures.  Because  we  found  none  we  combined 
the  parts  for  each  test  to  obtain  measures  that  encompass  more  of  the 
range  of  job  requirements  in  each  skill. 

The  intercorrelation  of  the  parts  is  generally  low,  with  only  a  few 
coefficients  as  high  as  .4  or  .5.  One  reason  they  are  low  is  that  the 
parts  are  usually  short  and  therefore  unreliable.  Another  reason  is 
that  the  samples  included  only  people  who  were  qualified  to  work  in  the 
skill.  Those  who  failed  to  qualify  for  assignment  to  the  skill  because 
of  low  ASVAB  scores  or  failure  in  the  training  course  were  not  available 
for  testing.  In  subsequent  analyses,  we  present  two  sets  of  correlation 
coefficients.  One  is  for  the  samples  of  selected  examinees,  called 
"uncorrected  correlation,"  and  the  second  is  the  estimated  correlation 
for  the  full  population  of  potential  recruits,  called  "corrected 
correlation."  The  corrected  values  tend  to  be  larger  because  they  apply 
to  the  full  range  of  potential  scores.  For  purposes  of  setting  enlist¬ 
ment  standards  and  aptitude  composite  prerequisites  for  assignment  to 
skill  training  courses,  the  corrected  values  are  the  appropriate  ones  to 
use.  Although  we  report  both  sets  of  correlations,  our  main  interpreta¬ 
tions  will  be  of  the  corrected  values. 

EFFECTS  OF  TEST  ADMINISTRATORS  ON  HANDS-ON  TEST  SCORES 

For  the  Ground  Radio  Repair  and  Mechanical  Maintenance  skills,  the 
same  test  administrator  gave  all  parts  of  the  hands-on  test  to  any  one 
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INTERCORRELATION  OF  WRITTEN  TEST— INFANTRY  RIFLEMAN' 
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examinee.  Therefore,  different  scoring  standards  used  by  test  adminis¬ 
trators  systematically  raised  or  lowered  the  hands-on  scores  for  all 
examinees  tested  by  the  same  administrator.  To  determine  the  effects  of 
test  administrators  on  hands-on  scores,  we  grouped  the  examinees  for 
each  test  administrator  and  computed  test  score  means  and  standard 
deviations.  If  the  examinees  were  assigned  randomly  to  administrators, 
then  the  test  scores  should  differ  only  by  chance.  In  table  7,  we  show 
the  hands-on  test  scores  for  the  examinees  tested  by  the  same  adminis¬ 
trator.  For  comparison  we  also  show  the  written  test  scores  and 
relevant  aptitude  composite  scores. 

The  hands-on  test  scores  for  both  skills  show  a  large  variation 
among  administrators.  Differences  between  administrators  are  not 
related  to  differences  on  the  written  test  or  aptitude  composite  scores 
(Electronics  Repair  for  Ground  Radio  Repair  and  Mechanical  Maintenance 
for  Automotive  Mechanic).  For  the  Ground  Radio  Repair  skill,  the 
hands-on  means  range  from  70.0  (for  administrator  6  who  tested  only 
two  examinees)  to  127.3  (for  administrator  2  who  tested  13  examinees). 
For  the  Automotive  Mechanic  skill,  the  hands-on  means  range  from  67 .8 
(for  administrator  2  who  tested  64  examinees)  to  75.1  (for  adminis¬ 
trator  1  who  tested  74  examinees  before  19  September  1981).  After 
19  September  1981,  the  mean  score  for  administrator  1  dropped  to  71.8. 
The  reason  for  the  drop  is  that  prior  to  19  September  1981  the  adminis¬ 
trators  provided  many  clues  to  examinees  about  correct  answers.  After 
19  September  they  were  instructed  to  refrain  from  providing  as  much 
help.  Test  administrator  2  did  most  of  his  testing  after  19  September 
(hands-on  mean  of  67.8),  whereas  administrator  4  did  all  of  his  before 
(hands-on  mean  of  73.9).  Administrator  3  tested  half  before  and  half 
after,  but  was  relatively  lenient  in  both  periods. 

The  hands-on  scores  were  put  on  the  same  score  scale  by  standar¬ 
dizing  the  scores  for  each  administrator  to  have  a  mean  of  50  and  a 
standard  deviation  of  10.  We  used  the  conventional  formula: 


10(X.  -  X.) 

Standard  score  =  50  H - 


where: 


S  . 
xi 


> 


X^  =  hands-on  test  score,  for  each  examinee  tested  by  administrator  i 

X.  *  the  mean  of  hands-on  scores  assigned  by  administrator  i 

Sx^  =  the  standard  deviation  of  hands-on  scores  assigned  by 
administrator  i. 


The  standard  scores  remove  differences  among  administrators  in  the  mean 
and  standard  deviation  of  the  scores  assigned  to  the  examinees  they 
tested.  However,  standard  scores  do  not  change  the  shape  of  the 
distribution.  For  example,  if  an  administrator  tends  to  assign  many 
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TEST  SCORES  SHOWN  BY  TEST  ADMINISTRATOR 
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d  42  o 


Test  adrainis tators  did  not  sign  the  score  sheets  for  the  first  44  examinees. 
'Examinees  tested  before  19  September  1981. 

Examinees  tested  after  19  September  1981. 


scores  close  to  the  maximum  possible  score,  the  standard  scores  will 
also  be  piled  up  at  the  high  end.  Also,  the  standard  scores  do  not 
change  the  correlation  between  hands-on  scores  and  any  other  variable. 
Standard  scores  are  linear  conversions  of  raw  scores  to  remove  leniency 
or  strictness  effects.  If  the  hands-on  scores  have  other  defects,  they 
are  retained. 

The  hands-on  test  scores  for  the  Radio  Repair  shill  do  have  a 
measurement  defect  that  is  retained  in  the  standard  scores.  The  maximum 
possible  hands-on  score  is  140.  The  mean  scores  for  administrators  1 
through  4  range  from  115.5  to  127.3,  with  standard  deviations  from  15.1 
to  24.6.  Of  the  154  examinees,  14  had  perfect  scores  of  140,  and  30 
scored  135  or  better.  The  large  number  of  high  scores  suggests  either 
that  the  test  was  too  easy  or  that  the  scores  do  not  reflect  the  true 
competence  level  of  the  examinees.  The  relatively  low  mean  for  the 
first  44  examinees  (100.6)  tested  before  the  administrators  signed  the 
score  sheet  indicates  that  the  administrators  became  more  lenient  after 
they  started  signing  their  names.  The  number  of  high  scores  raises 
questions  about  the  measurement  accuracy  of  the  hands-on  scores.  In 
subsequent  analyses,  we  will  examine  the  hands-on  scores  further  to  see 
how  satisfactorily  they  function  as  measures  of  performance. 

The  hands-on  scores  for  the  Automotive  Mechanic  skill  also  were 
piled  up  at  the  high  end.  The  maximum  score  is  81,  and  over  20  percent 
of  the  examinees  had  scores  of  79,  80,  or  81.  For  administrator  1  we 
standardized  the  scores  separately  for  examinees  tested  before  or  after 
19  September  1981.  For  each  of  the  other  three  administrators,  we  com¬ 
puted  a  single  set  of  standard  scores,  disregarding  the  time  of  testing. 
For  the  Automotive  Mechanic  skill  we  also  show  the  mean  hands-on  testing 
time  each  administrator  allowed  the  examinees.  The  maximum  time  was 
210  minutes,  and  no  administrator  consistently  approached  this  limit. 

For  the  Infantry  Rifleman  sample,  the  hands-on  test  scores  were 
used  as  assigned  by  test  administrators,  with  no  conversion  to  standard 
scores.  The  distribution  of  hands-on  scores  is  satisfactory.  The 
maximum  possible  hands-on  score  is  332,  and  the  mean  for  384  examinees 
is  138.7,  with  a  standard  deviation  of  37.6.  From  a  measurement  point 
of  view,  the  hands-on  scores  for  the  Infantry  Rifleman  sample  do  not 
have  any  obvious  defects. 

INTERCORRELATION  OF  PERFORMANCE  MEASURES 

After  standardizing  the  hands-on  scores,  we  examined  their  inter¬ 
relationship  and  their  correlation  with  other  variables:  final  course 
grades  in  skill  training  courses,  enlisted  grade,  and  job  experience. 

The  analysis  included  the  hands-on  scores  assigned  by  test  adminis¬ 
trators,  called  total  score,  and  the  standard  scores  for  the  Radio 
Repair  and  Mechanic  samples.  The  standard  scores  should  correlate  more 
highly  with  the  other  measures  than  do  the  total  scores. 
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The  hands-on  tests  for  the  Radio  Repair  and  Mechanic  samples  have 
both  a  total  score,  based  on  the  number  of  steps  scored  as  pass,  and  the 
time  taken  to  complete  the  hands-on  test.  Both  scores  may  provide  use¬ 
ful  information  about  level  of  performance.  We  computed  an  efficiency 
score  for  these  two  samples  as  the  ratio  of  hands-on  score  divided  by 
testing  time.  The  efficiency  score  shows  the  amount  of  performance  per 
unit  of  time.  In  general,  higher  performance  is  indicated  by  both  the 
total  score,  based  on  the  number  of  tasks  performed,  and  the  amount  of 
time  taken  to  perform  the  tasks.  No  efficiency  score  was  computed  for 
the  Infantry  Rifleman  sample  because  the  test  content  did  not  provide  a 
meaningful  measure  of  time  to  complete  the  test. 

Ground  Radio  Repair 


The  correlation  among  the  measures  for  the  Radio  Repair  sample  is 
shown  in  table  8.  Part  A  shows  the  coefficients  computed  on  the  sample, 
and  part  B  shows  the  estimated  correlation  in  the  population  of  poten¬ 
tial  recruits.  The  sample  size  in  part  A  is  129  for  the  intercorrela¬ 
tion  of  hands-on  test,  written  test,  enlisted  grade,  and  job  experience 
and  59  for  course  grades.  The  standard  errors  of  the  correlation 
coefficients  are  indicated  in  table  8.  The  sample  size  was  reduced 
because  of  incomplete  data.  In  appendix  B  we  present  more  complete  data 
for  the  samples  of  examinees.  All  coefficients  in  part  B  are  based  on 
59  cases  for  which  complete  data  were  available.  All  examinees  in 
enlisted  grade  E-l  were  removed  because  anyone  in  this  grade  had  been 
demoted. 

The  intercorrelation  among  the  measures  has  the  expected  positive 
pattern.  The  three  performance  measures — hands-on  test,  written  test, 
and  course  grade — are  consistent,  and  therefore  they  support  the 
validity  of  each  other  as  measures  of  job  performance  in  the  Ground 
Radio  Repair  skill.  They  have  the  expected  positive  correlation  with 
enlisted  grade  and  job  experience.  (In  appendix  A  we  describe  how  job 
experience  was  measured.)  The  corrected  coefficients  in  part  B  are 
large  and  positive.  The  magnitude  of  these  coefficients  shows  that  the 
three  performance  measures  are  measuring  something  in  common,  and  the 
high  correlation  of  the  hands-on  test  with  the  others  supports  the 
content  validity  of  all  measures.  The  evidence  is  strong  that  the 
measures  of  performance  have  satisfactory  measurement  properties. 

In  part  A  of  table  8  the  standard  scores  for  the  hands-on  test  have 
higher  correlation  coefficients  with  the  other  measures  than  do  the 
other  hands-on  scores  (total,  which  includes  differences  in  scoring 
standards  of  test  administrators;  time  to  complete  the  test;  and 
efficiency,  or  total  score  divided  by  testing  time).  We  retained  the 
standard  scores  in  subsequent  analysis  and  deleted  the  other  hands-on 
scores.  The  hands-on  score  in  part  B  and  the  other  performance  measures 
are  on  the  standard  score  scale,  with  a  mean  of  50  and  standard 
deviation  of  10. 
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CORRELATION  OF  PERFORMANCE  MEASURES — GROUND  RADIO  REPAIR 
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The  correlations  of  interest  for  course  grades  are  with  the  hands- 
on  and  written  proficiency  test  scores.  Both  the  uncorrected  and 
corrected  coefficients  are  satisfactory.  These  results  support  the 
traditional  use  of  grades  in  skill  training  courses  as  the  criterion 
measure  for  validating  ASVAB  and  for  establishing  enlistment 
qualification  scores. 

The  magnitude  of  the  corrected  coefficients  is  considerably  higher 
than  that  of  the  uncorrected  ones.  The  reason  is  that*  the  minimum 
qualifying  aptitude  composite  scores  for  the  Ground  Radio  Repair  Skill 
was  110,  which  eliminates  the  bottom  two-thirds  of  the  population.  The 
mean  Electronics  Repair  (EL)  aptitude  composite  score  for  the  sample  was 
about  118,  which  corresponds  to  a  percentile  score  of  about  80.  The 
standard  deviation  of  the  performance  measures  (part  B)  increased  by 
about  40  to  50  percent  in  the  population  compared  to  the  sample.  These 
large  increases  reflect  the  severe  selection  of  recruits  who  are  eligi¬ 
ble  to  become  radio  repairers  in  the  Marine  Corps.  With  such  severe 
selection,  the  corrected  values  may  be  in  error;  the  bias,  however,  is 
that  the  corrected  values  tend  to  be  underestimates  of  the  true  popula¬ 
tion  values.  As  we  shall  see  in  the  following  subsections,  these 
results  are  consistent  with  those  for  the  Automotive  Mechanic  and 
Infantry  Rifleman  skills,  which  increases  our  confidence  that  the 
corrected  values  are  reasonably  accurate. 

Automotive  Mechanic 


The  performance  measures  for  the  Automotive  Mechanic  sample  are 
consistent  measures  of  job  performance  (table  9).  The  sample  size  is 
131  cases  for  all  correlation  coefficients.  The  uncorrected  correlation 
coefficients  (part  A,  table  9)  have  the  desired  pattern  of  positive 
values,  except  for  amount  of  time  to  complete  the  hands-on  test,  which 
should  be  negatively  correlated  with  the  other  measures.  The  magnitude 
of  the  estimated  population  coefficient  (part  B)  is  adequate  to  support 
the  content  validity  of  the  performance  measures  (hands-on  test,  written 
test,  and  course  grade).  Taken  together  with  the  Radio  Repair  skill, 
the  results  indicate  that  job  performance  in  technical  skills  can  be 
measured  reliably . 

The  hands-on  test  conveys  the  most  meaning  when  the  efficiency 
scores  are  used.  In  this  sample,  we  computed  efficiency  as  the  ratio  of 
standard  scores  over  time.  The  hands-on  scores  in  part  B,  and  in 
subsequent  analyses,  are  the  efficiency  scores.  The  standard  scores, 
according  to  the  correlation  coefficients,  are  more  accurate  measures 
than  the  total  hands-on  scores,  which  include  differences  among  test 
administrators.  By  also  including  time  in  the  hands-on  score,  the 
correlation  with  other  measures  is  further  increased.  For  example,  the 
uncorrected  correlation  coefficient  between  the  efficiency  score  and  the 
written  test  is  .35,  compared  to  .26  between  the  standard  score  and 
written  test.  The  other  three  scores  for  the  hands-on  test  (total, 
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Number  of  cases  is  131;  probability  is  -95  that  coefficients  of  .17  are  greater  than  zero. 
‘Reported  in  raw  scores,  except  for  course  grades  which  are  standardized  scores. 

Reported  in  standard  scores. 


standard,  and  time)  help  support  the  meaning  of  the  hands-on  test 
scores,  but  they  are  not  as  useful  as  the  efficiency  score. 

Training  course  grades  had  satisfactory  correlation  with  the  other 
measures,  and  they  were  retained  as  a  performance  measure.  As  for  the 
Radio  Repair  skill,  we  found  the  three  measures  (part  B)  to  be 
meaningful  and  useful  for  evaluating  job  performance. 

Infantry  Rifleman 


Analysis  of  the  Infantry  Rifleman  skill  was  complicated  because 
about  half  of  the  examinees  had  taken  ASVAB  5/6/7  at  time  of  enlistment 
and  the  remainder  had  taken  ASVAB  8/9/10.  The  former  group  had  been  in 
the  Marine  Corps  longer  than  the  latter  group  (mean  months  of  service 
was  16.8  versus  8.9).  To  compute  the  intercorrelation  of  the  perfor¬ 
mance  measures,  we  combined  the  two  groups  and  obtained  241  cases.  An 
exception  is  for  course  grades,  for  which  we  used  only  53  cases  with 
ASVAB  8/9/10  scores.  The  uncorrected  coefficients  are  shown  in 
table  10,  part  A.  The  corrected  values  (part  B)  are  based  on  the 
subtests  common  to  ASVAB  5/6/7  and  ASVAB  8/9/10.  Details  are  presented 
in  appendix  B. 

The  intercorrelation  of  the  three  performance  measures  (hands-on 
test,  written  test,  and  course  grades)  have  the  desired  pattern  of 
positive  values  (part  B  of  table  10).  The  magnitude  of  the  corrected 
coefficients  is  smaller  than  for  the  two  technical  skills.  The  lower 
values  could  be  a  function  of  the  job  requirements  or  of  test  content. 
The  results  suggest  that  the  latter  explanation  is  more  plausible. 

Job  experience,  measured  as  months  in  the  Marine  Corps,  has  a 
negative  correlation  with  the  proficiency  tests;  enlisted  grade  is 
essentially  uncorrelated  with  the  performance  measures.  These  results 
are  counter  to  our  expectations.  An  explanation  is  that  some  of  the 
content  of  the  proficiency  tests  may  reflect  content  of  the  training 
course  that  is  not  used  often  on  the  job;  perhaps  many  examinees  tended 
to  forget  some  of  the  content  specific  to  the  training  course  by  the 
time  they  took  the  test.  We  also  computed  the  correlation  between  time 
in  the  Marine  Corps  and  the  proficiency  tests  for  the  group  of  53  that 
was  tested  with  ASVAB  8/9/10.  The  correlation  coefficients  of  time  in 
service  with  the  hands-on  test,  written  test,  and  course  grade  were 
-.29,  +.05,  and  -.29,  respectively.  In  addition,  there  is  a  slight 
tendency  for  the  more  recent  accessions  to  have  higher  ASVAB  scores, 
which  also  helps  explain  the  negative  correlation  coefficients. 

Course  grades  correlate  quite  well  with  the  performance  measures. 
This  suggests  that  similar  types  of  content  are  included  in  both  the 
tests  in  the  training  course  and  in  the  proficiency  tests.  Course 
grades  in  the  Infantry  Training  School  were  heavily  determined  by  paper- 
and-pencil,  multiple  choice  achievement  tests.  The  students  were 
required  to  recall  what  they  learned  during  the  course.  The  content  of 
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of  cases  is  241 


the  hands-on  and  written  proficiency  tests  also  involves  memory  of  facts 
and  rules  taught  during  the  training  course,  such  as  memory  of  acronyms 
and  hand  signals*  See  appendix  A  for  a  more  detailed  description  of  the 
test  content.  Apparently  some  of  the  proficiency  test  content  was  not 
adequately  reinforced  during  their  training  in  the  field  after  gradua¬ 
tion  from  the  course,  and  some  of  the  examinees  forgot  much  of  it. 

The  performance  measures  in  each  of  the  three  skills  are  measuring 
something  in  common.  The  consistently  high  correlation  of  the  hands-on 
test  with  the  other  measures  supports  the  content  validity  of  all 
measures.  The  results  increase  our  confidence  that  the  measures  are  in 
fact  evaluating  job  performance.  We  now  turn  to  the  validity  of  ASVAB 
aptitude  composites  to  provide  further  evidence  to  support  the  content 
validity  of  the  performance  measures. 

VALIDITY  OF  ASVAB  APTITUDE  COMPOSITES 

ASVAB  aptitude  composites  traditionally  have  been  developed  using 
grades  in  skill  training  courses  as  the  criteria.  An  exception  has  been 
the  Combat  (CO)  aptitude  composite  the  Army  and  Marine  Corps  use  to 
assign  recruits  to  combat  arms  skills  (such  as  infantry,  armor).  For 
these  skills,  ratings  of  performance  in  combat  during  the  Korean  and 
Vietnam  conflicts  have  been  used  as  the  primary  criteria.  The  defini¬ 
tions  of  most  aptitude  composites,  in  terms  of  subtests  in  each,  have 
remained  relatively  stable  since  the  classification  batteries  were 
introduced  in  the  late  1940s.  The  Clerical  (CL)  or  Administrative 
composite  has  included  tests  of  verbal  skills  and  of  perceptual  speed 
and  accuracy.  The  Mechanical  Maintenance  (MM)  composite  has  included 
automotive  information;  Electronics  Repair  (EL),  electrical  or  elec¬ 
tronics  information;  and  General  Technical  (GT) ,  verbal  and  quantitative 
skills.  The  definitions  of  the  aptitude  composites  have  been  reasonable 
to  experienced  personnel  managers,  and  they  were  derived  from  empirical 
data. 


Because  of  their  longstanding  use  and  acceptance  by  DoD  personnel 
managers,  aptitude  composites  can  help  establish  the  credibility  of  the 
performance  measures.  For  the  Ground  Radio  Repair  skill,  the  EL  apti¬ 
tude  composite  should  have  a  higher  predictive  validity  than  either  GT, 
which  tends  to  measure  academic  skills,  or  CL,  which  is  appropriate  for 
office  jobs.  For  the  Automotive  Mechanic  skill,  the  MM  composite  should 
have  a  higher  validity  than  GT  or  CL. 

For  the  Infantry  Rifleman  skill,  our  a  priori  expectations  are  not 
as  clear.  The  job  requirements  for  riflemen  in  combat  are  difficult  to 
define,  and  then  to  capture  the  requirements  in  a  paper-and-pencil  test 
battery  is  even  more  difficult.  The  types  of  items  found  most  predictive 
of  combat  performance  during  the  Korean  conflict  were  self-descriptions. 
These  items  were  incorporated  into  the  Classification  Inventory  that  was 
part  of  the  Army  Classification  Battery  and  ASVAB  since  1958.  Items  in 
the  Classification  Inventory  were  updated  during  the  Vietnam  conflict. 
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The  Classification  Inventory  was  dropped  from  ASVAB  8/9/10  in  1980,  and 
the  CO  composite  no  longer  contains  self-description  items.  For  the 
Infantry  Rifleman  skill,  we  may  gain  some  insight  into  the  job  require¬ 
ments  by  the  predictive  validity  of  different  aptitude  composites. 

Ground  Radio  Repair 


For  the  Ground  Radio  Repair  skill,  the  EL  aptitude  composite  has 
the  highest  validity  for  predicting  the  written  test  and  course  grade, 
but  not  for  predicting  the  hands-on  test  (table  11).  The  uncorrected 
correlation  coefficients  cannot  be  compared  directly  because  of  the 
severe  selection  of  the  examinees  on  the  basis  of  their  EL  scores.  The 
corrected  validity  coefficients  have  been  made  comparable  by  estimating 
the  values  in  an  unselected  population.  The  pattern  of  uncorrected  and 
corrected  validity  coefficients  is  the  same;  a  consistent  result  is  that 
EL  has  a  lower  predictive  validity  against  the  hands-on  test  than  does 
either  the  GT  or  CL  composite. 


TABLE  11 

VALIDITY  OF  ASVAB  APTITUDE  COMPOSITES— 
GROUND  RADIO  REPAIR3 

ASVAB  aptitude  composite 


Uncorrected  Corrected 


Performance 


measure 

ELb 

GTC 

CLd 

EL 

GT 

CL 

Hands-on  test 

.21 

.36 

.32 

.59 

cr> 

00 

.62 

Written  test 

.34 

.33 

.25 

.73 

.69 

.61 

Course  grade 

.43 

.30 

.23 

.75 

.62 

.57 

aNumber  of  cases  is  59. 
^Electronics  Repair . 
cGeneral  Technical. 
dClerical • 


The  pattern  of  corrected  coefficients  supports  the  content  validity 
of  the  other  two  performance  measures  (written  test  and  course  grades) . 
The  content  validity  of  the  hands-on  test  does  not  need  empirical 
support.  The  relatively  high  validity  of  the  GT  composite,  which  is 
largely  a  measure  of  academic  aptitude,  suggests  that  the  hands-on  test 
contains  a  component  of  general  mental  ability,  as  well  as  skills  and 
knowledge  specific  to  electronics  repair.  The  hands-on  test  required 
the  examinees  to  apply  their  skills  and  knowledge  in  a  novel  situation 
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by  troubleshooting  a  new  piece  of  equipment.  The  relatively  high  valid¬ 
ity  of  the  aptitude  composites  against  the  hands-on  test  counters  the 
argument  advanced  by  some  people  that  although  the  ASVAB  can  predict 
success  in  training  courses,  it  cannot  predict  hands-on  performance  of 
job  tasks. 

The  magnitude  of  the  corrected  EL  validity  coefficients  (from  .59 
to  *75)  shows  that  EL  is  able  to  predict  performance  in  the  Ground  Radio 
Repair  skill.  Even  though  the  pattern  does  not  conform  to  prior 
expectations,  the  absolute  values  are  satisfactory. 

Automotive  Mechanic 


Of  the  three  aptitude  composites  shown  in  table  12,  MM  has  the 
highest  predictive  validity  against  all  three  performance  measures. 
Course  grades  are  especially  predictable  by  MM.  The  corrected  validity 
coefficient  is  .83,  and  even  the  uncorrected  value  is  .73.  The  pattern 
of  validity  conforms  to  our  a  priori  expectations,  and  the  content 
validity  of  the  measures  is  supported. 

The  hands-on  test  for  the  Automotive  Mechanic  skill  required 
examinees  to  perform  tasks  on  which  they  had  been  trained  and  on  which 
they  should  have  had  numerous  opportunities  to  perform  as  part  of  their 
normal  job  duties  (working  on  the  quarter-ton  Jeep). 

The  magnitude  of  the  corrected  validity  coefficients  indicates  that 
the  ASVAB  is  a  good  predictor  of  performance  in  the  Automotive  Mechanic 
skill.  Recruits  can  be  assigned  as  mechanics  on  the  basis  of  their  MM 
scores,  and  their  job  performance  will  be  reasonably  consistent  with 
their  aptitude  scores. 


TABLE  12 

VALIDITY  OF  ASVAB  APTITUDE  COMPOSITES — 
AUTOMOTIVE  MECHANIC* 

ASVAB  aptitude  composite 


_ Uncorrected _  Corrected 

Performance 


measure 

MM° 

GTC 

CLa 

MM 

GT 

CL 

Hands-on  test 

.49 

.23 

.23 

.56 

.39 

.39 

Written  test 

.49 

.32 

.23 

.65 

.55 

.48 

Course  grade 

.73 

.58 

.46 

.83 

.75 

•  66 

a 

b 

c 


d 


Number  of  cases  is  131. 
Mechanical  Maintenance. 
General  Technical. 
Clerical . 
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Infantry  Rifleman 


The  validity  of  the  aptitude  composites  for  the  Infantry  Rifleman 
skill  are  shown  in  table  13.  In  part  A,  the  results  are  shown  for 
examinees  tested  with  ASVAB  8/9/10,  and  in  part  B,  for  those  tested  with 
ASVAB  5/6/7.  The  pattern  of  validity  coefficients  does  not  conform  to 
our  prior  expectations.  GT  is  the  most  valid  predictor  of  all  perfor¬ 
mance  measures,  except  for  the  hands-on  test  in  the  group  tested  with 
ASVAB  8/9/10.  In  this  group,  CO  has  a  corrected  validity  of  .58  com¬ 
pared  to  .53  for  GT.  The  proficiency  tests  are  more  predictable  than 
course  grades.  The  difference  in  size  of  validity  coefficients  suggests 
that  course  grades  may  be  measuring  things  somewhat  different  from  the 
hands-on  and  written  tests.  Because  the  proficiency  tests  were  devel¬ 
oped  by  job  experts  explicitly  to  measure  job  requirements,  we  can  be 
more  confident  of  their  content  validity  than  of  the  course  grades. 

But  there  are  enough  results  that  raise  questions  about  the  meaning 
of  the  proficiency  test  scores  for  the  Rifleman  skill.  The  ASVAB 
subtests  with  the  highest  predictive  validity  against  the  proficiency 
tests  measure  verbal  ability  (the  validity  of  the  ASVAB  subtests  is 
presented  in  appendix  B) .  Although  verbal  ability  is  important  for 
riflemen,  it  does  not  ordinarily  come  to  mind  as  a  prime  requirement  for 
success  in  the  skill.  From  the  results  on  the  consistency  of  perfor¬ 
mance  measures  for  the  Infantry  Rifleman  skill,  we  are  left  with  some 
doubt  about  the  content  validity  of  any  of  the  measures.  The  magnitude 
of  the  corrected  validity  coefficients  (ranging  from  .5  to  .6)  indicates 
that  the  ASVAB  is  a  reasonably  valid  predictor  of  success  in  the 
Infantry  Rifleman  skill,  as  success  is  measured  in  peacetime.  The 
results  suggest  that  the  ASVAB  can  continue  to  be  used  for  assigning 
recruits  to  the  Infantry  Rifleman  skill  with  reasonable  assurance  that 
the  aptitude  scores  are  related  to  performance  as  measured  by  these 
tests . 

PREDICTABILITY  OF  COMPOSITE  PERFORMANCE  MEASURES 

The  previous  analyses  in  this  chapter  have  shown  that  the  perfor¬ 
mance  measures  are  reasonably  consistent  and  that  the  relevant  ASVAB 
aptitude  composites  generally  are  accurate  predictors  of  the  measures. 
With  the  performance  measures,  we  determine  how  well  people  perform  on 
the  job.  Through  the  pattern  of  ASVAB  validity  coefficients,  we  know 
the  types  of  people  that  have  the  aptitude  to  do  well  in  each  skill. 
People  with  relatively  high  aptitude  in  Electronics  Repair  can  be 
assigned  to  radio  repair;  those  high  in  Mechanical  Maintenance,  to 
automotive  mechanics;  and  those  in  Combat,  to  riflemen.  The  next  ques¬ 
tion  to  be  addressed  is  whether  combining  the  performance  measures  would 
change  the  size  of  the  validity  coefficients  enough  to  affect  the 
minimum  qualifying  scores  for  assignment  to  the  skills. 
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TABLE  13 


VALIDITY  OF  ASVAB  APTITUDE  COMPOSITES  INFANTRY  RIFLEMAN 
Part  A:  Examinees  Tested  with  ASVAB  8/9/10a 

ASVAB  aptitude  composite 


Uncorrected  Corrected 


Performance 

measure 

cob 

GTC 

CLd 

CO 

GT 

CL 

Hands-on  test 

.40 

.37 

.20 

.58 

.53 

.41 

Written  test 

.48 

.55 

.31 

.69 

.77 

.51 

Course  grade 

.13 

.21 

-.04 

.29 

.41 

.08 

Part  B: 

Examinees  Tested  with  ASVAB  5/6/7e 

Uncorrected 

Corrected 

Performance 

measure 

CO 

GT 

CL 

CO 

GT 

CL 

Hands-on  test 

.31 

.43 

.24 

.53 

.64 

.53 

Written  test 

.28 

.58 

.39 

•54 

.77 

.65 

Course  grade 

- 

-f 

_f 

_f 

_f 

_f 

aNumber  of  cases  is  53. 

^Combat . 

cGeneral  Technical 
^Clerical. 

eNumber  of  cases  is  140. 

%ot  computed. 

Ground  Radio  Repair 

The  predictive  validity  of  EL  is  shown  in  table  14.  The  perfor¬ 
mance  composites  are  reasonable  combinations  of  the  measures  that  might 
be  used  to  evaluate  performance.  The  three  performance  measures  are 
labeled  1,  2,  and  3,  and  the  composites  are  shown  as  combinations  of  the 
numbers.  The  composites  are  about  equally  predictable  by  EL;  the  valid¬ 
ity  coefficients  range  from  .76,  for  the  hands-on  plus  written  tests 
(1+2),  to  .82,  for  the  proficiency  tests  and  course  grades  (1+2+3 
and  2+3).  The  similarity  of  these  values  means  that  about  the  same 
level  of  minimum  qualifying  EL  score  would  be  established  against  each 
of  the  composite  performance  measures.  In  other  words,  about  the  same 
people  would  be  assigned  to  the  Ground  Radio  Repair  skill  no  matter 
which  composite  performance  measure  is  used  as  the  criterion  for  setting 
qualifying  standards . 
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TABLE  14 


VALIDITY  OF  ELECTRONICS  REPAIR  APTITUDE 
COMPOSITE— GROUND  RADIO  REPAIR 

Validity  of  Electronics  Repair  (EL) 
aptitude  composite 


Performance  measure 


Uncorrected 


Corrected 


1  Hands-on  test  .21  .59 

2  Written  test  .34  .73 

3  Course  grade  .43  .75 

Composite : 

1+2  .36  .76 

1  +  2  +  3  .47  .82 

2  +  3  .48  .82 


The  composites  were  obtained  by  standardizing  each  performance 
measure  to  have  a  mean  of  50  and  a  standard  deviation  of  10.  When 
combining  the  measures,  they  were  weighted  equally,  The  combination  of 
hands-on  test,  written  test,  and  course  grade  (1+2+3)  was  standard¬ 
ized  as  a  measure  of  proficiency;  this  combination  appears  to  be  the 
best  measure  of  the  skills  and  knowledge  required  to  perform  the  job 
tasks.  This  comprehensive  measure  is  perhaps  the  best  evaluation  of 
skills  and  knowledge  required  for  the  job. 

As  we  discussed  in  chapter  1,  proficiency  tests  are  expensive  to 
develop  and  administer.  If  they  are  to  be  used  for  establishing  quali¬ 
fication  standards,  they  should  provide  information  not  available  from 
more  economical  performance  measures,  such  as  training  grades.  For  the 
Ground  Radio  Repair  skill  we  found  that  using  course  grades  alone  pro¬ 
duces  the  same  answer  for  establishing  qualification  standards  as  would 
using  the  proficiency  tests.  The  hands-on  test  is  less  predictable  by 
EL,  and  its  use  would  result  in  higher  qualification  standards  than 
would  use  of  either  the  written  proficiency  test  or  course  grades.  These 
results  do  not  indicate  a  need  in  the  Ground  Radio  Repair  skill  to  use  a 
performance  measure  different  from  the  traditional  course  grades  for 
establishing  qualification  standards. 

Automotive  Mechanic 


For  the  Automotive  Mechanic  skill,  course  grades  are  much  more 
predictable  by  MM  than  are  the  other  performance  measures  (table  15). 
Performance  composites  that  include  course  grades  are  more  predictable 
than  those  without.  Because  of  their  lower  predictability,  the  use  of 
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proficiency  tests  for  establishing  minimum  qualification  scores  could 
result  in  somewhat  higher  ASVAB  qualification  standards  than  use  of  only 
course  grades. 


TABLE  15 


VALIDITY  OF  MECHANICAL  MAINTENANCE 
APTITUDE — AUTOMOTIVE  MECHANIC 


Validity  of  Mechanical  Maintenance  (MM) 
_ aptitude  composite _ 

Performance 

measure  Uncorrected  Corrected 


1  Hands-on  test 

2  Written  test 

3  Course  grade 

Composite: 

1+2 
1  +  2  +  3 


.49 

.49 

.73 


.56 

.65 

.83 


.60  .71 

.72  .82 


2  +  3 


.69 


.81 


The  corrected  validity  coefficient  for  predicting  course  grades 
(.83)  is  somewhat  higher  than  normally  found.  Grades  in  the  Automotive 
Mechanic  course  are  among  the  most  predictable  of  all  courses  in  the 
Marine  Corps.  In  a  recent  CNA  study  the  validity  of  the  MM  aptitude 
composite  was  .64  [3]  against  course  grades.  The  .83  found  in  this 
study  is  higher  than  the  .64  but  not  different  enough  to  discredit  the 
results.  Course  grades  appear  to  be  an  adequate  criterion  measure  in 
this  skill  for  establishing  qualification  standards. 

Infantry  Rifleman 


The  validity  of  the  CO  aptitude  composite  was  computed  for  two 
groups  of  examinees:  those  tested  with  ASVAB  5/6/7  and  those  tested 
with  ASVAB  8/9/10  (table  16).  For  the  Infantry  Rifleman  skill,  the 
hands-on  and  written  proficiency  tests  are  more  predictable  than  course 
grades.  The  most  predictable  performance  composite  is  the  sum  of  the 
hands-on  and  written  scores  (r  =  .72  in  the  group  tested  with 
ASVAB  8/9/10).  Adding  course  grades  lowered  the  predictive  validity. 

The  most  comprehensive  performance  composite  (1+2+3)  has  a  correla¬ 
tion  coefficient  with  CO  of  .64  (in  the  group  tested  with  ASVAB  8/9/10). 
The  differences  in  validity  coefficients  mean  that  different  minimum 
qualification  standards  would  be  set  for  different  performance  measures. 
Because  the  proficiency  tests  are  more  predictable  by  CO,  they  would 
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result  in  lower  ASVAB  qualification  standards  than  would  use  of  course 
grades . 


TABLE  16 

VALIDITY  OF  COMBAT  APTITUDE  COMPOSITE— INFANTRY  RIFLEMAN 

Validity  of  Combat  (CO)  aptitude  composite 

_ Group  la _  _ Group  2^ _ 

Performance 

measure  Uncorrected  Corrected  Uncorrected  Corrected 


1  Hands-on  test 

.31 

.53 

.40 

.58 

2  Written  test 

.28 

.54 

.48 

.69 

3  Course  grade 

_c 

_c 

.13 

.29 

Composite : 

1  +  2 

.35 

.59 

.52 

.72 

1  +  2  +  3 

_c 

_c 

.43 

.64 

2  +  3 

_c 

__c 

.35 

.57 

aExaminees  tested 

with 

ASVAB  5/6/7; 

number 

of  cases 

is 

140. 

^Examinees  tested 

with 

ASVAB  8/9/10; 

number 

of  cases 

is 

53. 

cNot  computed. 


SUMMARY 

In  this  chapter  we  examined  the  consistency  among  the  three  mea¬ 
sures  of  performance  (hands-on  test,  written  test,  and  course  grades). 

In  general,  we  found  that  the  intercorrelation  among  them  indicates  they 
tend  to  be  measuring  the  same  thing.  Prior  to  using  the  hands-on  and 
written  test  scores  in  any  analyses,  we  computed  the  intercorrelation 
among  the  parts  of  each  test.  We  found  that  the  parts  tended  to  be 
positively  intercorrelated .  The  statistical  analysis  found  no  reasons 
to  drop  any  of  the  measures. 

We  then  correlated  ASVAB  aptitude  composites  with  the  performance 
measures.  In  general,  the  relevant  aptitude  composites  had  the  highest 
predictive  validity  against  the  performance  measures  (EL  for  the  Ground 
Radio  Repair  skill;  MM  for  the  Automotive  Mechanic  skill;  and  CO  for  the 
Infantry  Rifleman  skill).  The  hands-on  tests  for  Radio  Repair  and 
Rifleman  skills,  however,  were  more  predictable  by  GT,  a  measure  of 
academic  aptitude,  than  by  EL  and  CO,  respectively.  The  hands-on  test 
scores  for  Radio  Repair  and  Mechanic  skills  were  found  to  be  suspect 
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because  they  were  piled  up  at  the  high  end.  The  test  scores  for  rifle¬ 
men  may  have  a  large  memory  component. 

Finally,  we  developed  composites  of  the  performance  measures.  For 
the  two  technical  skills,  use  of  the  hands-on  and  written  proficiency 
tests  as  criterion  measures  would  give  the  same  results  as  course 
grades.  For  the  Infantry  Rifleman  skill,  however,  the  proficiency  tests 
were  more  predictable  than  course  grades. 

The  analyses  in  this  chapter  did  not  attempt  to  establish  minimum 
qualifying  scores.  All  they  were  intended  to  do  was  establish  the 
credibility  of  the  measures  for  evaluating  job  performance.  In  the  next 
chapter,  we  use  the  performance  measures  to  compute  minimum  qualifying 
scores . 
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CHAPTER  3 


ESTABLISHMENT  OF  ASVAB  QUALIFICATION  STANDARDS 
FROM  THE  PERFORMANCE  MEASURES 


INTRODUCTION 

The  relationship  between  selection  and  classification  test  bat¬ 
teries,  such  as  the  ASVAB,  and  measures  of  performance  has  long  been 
used  in  the  personnel  testing  tradition  for  setting  qualification 
standards.  In  this  chapter  we  outline  the  model  that  has  guided  use  of 
selection  and  classification  tests  in  military  personnel  decisions.  We 
then  compute  a  set  of  qualifying  standards  on  the  ASVAB  using  the  avail¬ 
able  data  in  this  study,  supplemented  with  assumptions  about  the  meaning 
of  the  performance  measure  scores.  We  close  the  chapter  by  presenting 
the  percentage  of  satisfactory  performers  in  ASVAB  score  intervals.  The 
results  presented  in  this  chapter  do  not  consider  costs  of  recruiting 
applicants  when  setting  qualification  standards,  or  the  cost  of 
rejecting  people  who  fail  to  meet  the  qualification  standards  but  who 
would,  if  accepted,  perform  satisfactorily  on  the  criterion  measure.  A 
more  thorough  cost-effectiveness  analysis  of  qualification  standards  is 
the  subject  of  a  follow-on  research  effort. 

The  model  that  has  guided  use  of  selection  and  classification  tests 
by  the  military  services  may  be  characterized  as  follows.  The  ASVAB  is 
used  to  provide  information  to  personnel  decision  makers  about  how  well 
potential  recruits  are  expected  to  perform  in  the  variety  of  military 
jobs.  Because  most  applicants  for  enlistment  have  limited  job  experi¬ 
ence,  and  the  military  services  have  such  a  broad  range  of  skills  open 
to  recruits,  personnel  decision  makers  need  an  accurate  and  efficient 
way  to  predict  how  well  applicants  can  perform  across  the  range  of  mili¬ 
tary  skills.  The  ASVAB  is  generally  accepted  by  personnel  managers  as 
an  adequate  predictor  of  performance  in  the  military.  Based  on  ASVAB 
scores  and  other  information,  applicants  are  judged  to  be  qualified  for 
service  or  not.  If  their  predicted  performance  is  in  the  satisfactory 
range,  then  they  are  said  to  be  qualified.  If  their  predicted 
performance  is  unsatisfactory,  then  the  applicants  are  judged  to  be 
unqualified  for  enlistment. 

The  model  requires  three  essential  pieces  of  information.  First, 
the  ASVAB  must  be  a  valid  predictor  of  performance  in  the  military.  If 
the  ASVAB  is  a  poor  predictor,  then  selection  and  classification  deci¬ 
sions  based  on  ASVAB  scores  are  close  to  random,  and  the  predicted 
performance  of  those  who  qualify  on  the  ASVAB  differs  little  from  that 
of  those  who  are  unqualified.  Only  to  the  extent  that  the  ASVAB  is  an 
accurate  predictor  do  qualifying  standards  result  in  improving  the 
performance  of  people  accepted  for  service.  Fortunately  the  results  in 
chapter  2  support  the  predictive  validity  of  the  ASVAB,  and  qualifying 
standards  can  be  set  with  reasonable  confidence. 
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The  second  piece  of  information  relates  to  the  difference  between 
satisfactory  and  unsatisfactory  levels  of  performance.  Somehow  a  score 
must  be  set  on  the  performance  measure  that  demarcates  satisfactory  and 
unsatisfactory  performance.  The  score  can  be  set  directly  on  the  per¬ 
formance  measure  itself,  such  as  specifying  the  number  of  items  or  tasks 
that  must  be  passed;  or,  the  satisfactory  score  can  be  set  indirectly  by 
specifying  the  percentage  of  the  population  that  would  perform  satis¬ 
factorily  and  then  setting  the  minimum  satisfactory  score  on  the 
performance  measure  accordingly. 

The  third  piece  of  information  concerns  an  acceptable  rate  of 
unsatisfactory  performance  among  those  who  meet  the  qualifying  standards 
on  ASVAB.  Because  the  ASVAB,  as  any  selection  and  classification  test, 
does  not  predict  performance  perfectly,  some  people  who  qualify  on  the 
ASVAB  will  subsequently  have  unsatisfactory  performance  scores.  The 
services  perforce  must  live  with  recruits  who  are  unsatisfactory.  A 
traditional  practice  in  the  military  is  to  decide  on  an  acceptable 
failure  rate  in  skill  training  courses. 

With  these  three  pieces  of  information — validity  of  the  ASVAB, 
satisfactory  score  on  the  performance  measure,  and  acceptable  failure 
rate — ASVAB  qualifying  scores  can  be  set.  In  the  next  subsection,  we 
compute  a  set  of  qualifying  scores  on  the  ASVAB. 

COMPUTING  QUALIFYING  APTITUDE  COMPOSITE  SCORES 

Satisfactory  Score  on  the  Performance  Measures 


In  this  study  we  made  an  assumption  about  the  percentage  of  the 
population  that  would  be  satisfactory  performers  in  each  skill.  As  a 
rule,  this  type  of  assumption  is  more  plausible  than  attempting  to  set 
the  satisfactory  score  directly  on  the  performance  measure  itself.  The 
former  assumption  requires  only  that  we  know  something  about  the  diffi¬ 
culty  of  the  skill  compared  to  other  jobs.  The  assumption  about  setting 
a  cutting  score  on  the  performance  measure  requires  that  we  know  how  the 
content  of  the  performance  measure  is  related  to  the  full  set  of  job 
requirements,  and  further  that  a  meaningful  and  unambiguous  demarcation 
can  be  made  between  satisfactory  and  unsatisfactory  scores  on  the  per¬ 
formance  measure.  Establishing  an  a  priori  satisfactory  score  on  the 
performance  measure,  called  "criterion-referenced  standards"  in  testing 
jargon,  implies  an  absolute  level  of  performance  that  is  unaffected  by 
testing  conditions  or  by  the  difficulty  of  the  test.  Because  our  per¬ 
formance  measures  are  experimental,  we  would  be  especially  reluctant  to 
establish  a  priori  satisfactory  scores  on  them. 

The  percentages  of  satisfactory  performers  for  the  Radio  Repair  and 
Mechanic  skills  were  obtained  from  data  for  the  World  War  II  (WWII)  era. 
The  Army  General  Classification  Test  (AGCT)  scores  were  computed  for 
soldiers  grouped  by  their  former  civilian  occupation  [4].  AGCT  score 
distributions  were  computed  for  people  who  were  radio  repairers  and 
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automobile  mechanics.  The  mean  AGCT  score  for  radio  repairers  was  117, 
which  is  about  one  standard  deviation  above  the  population  mean  of  100. 
We  assumed  that  the  bulk  of  radio  repairers  were  satisfactory,  but  that 
the  bottom  of  the  distribution  was  unsatisfactory.  We  selected  the 
point  one  standard  deviation  below  the  mean  performance  of  all  radio 
repairers  as  the  cutting  score  to  demarcate  satisfactory  and  unsatis¬ 
factory  performance;  this  point  is  about  the  population  mean  of  the  AGCT 
scores.  Thus,  we  assume  that  half  the  population  could  be  satisfactory 
radio  repairers. 

For  automobile  mechanics,  the  mean  AGCT  score  of  WWII  soldiers  who 
were  mechanics  before  entering  the  Army  was  102,  close  to  the  population 
mean.  The  AGCT  score  one  standard  deviation  below  the  mean  of  the  auto¬ 
mobile  mechanic  sample  was  about  85,  which  corresponds  to  a  percentile 
score  of  about  25.  We  assumed  that  70  percent  of  the  population  could 
be  satisfactory  automotive  mechanics. 

No  comparable  data  are  available  for  riflemen.  We  assumed  that 
80  percent  of  the  population  would  be  satisfactory  riflemen.  Based*  on 
the  experience  of  the  Marine  Corps  and  Army  during  and  since  WWII,  vir¬ 
tually  all  males  eligible  to  serve  can  be  trained  to  become  a  rifleman. 
The  primary  bar  to  being  a  satisfactory  rifleman  is  physical  ability. 
Some  mental  standards,  as  measured  by  the  ASVAB,  also  apply.  Congress 
has  established  that  the  bottom  10  percent  of  the  population  on  ASVAB 
cannot  be  inducted  during  mobilization.  Because  even  riflemen  should 
have  minimal  literacy  skills  to  cope  with  their  job  requirements,  the 
Marine  Corps  and  Army  prefer  to  maintain  somewhat  higher  standards  for 
assignment  to  infantry  jobs.  Our  assumption  that  80  percent  could  be 
satisfactory  riflemen  applies  to  those  who  are  physically  able. 

In  summary,  the  assumptions  we  made  about  the  percentage  of  the 
population  that  would  be  satisfactory  in  each  skill  are: 

•  Ground  Radio  Repair — 50  percent  of  the  population  could  be 
trained  to  be  satisfactory  performers,  which  implies  that 
under  normal  circumstances  50  percent  would  be 
unsatisfactory  performers . 

•  Automotive  Mechanic — 70  percent  would  be  satisfactory,  and 
30  percent  unsatisfactory. 

•  Infantry  Rifleman — 80  percent  satisfactory  and  20  percent 
unsatisfactory. 

Acceptable  Rate  of  Unsatisfactory  Performance 


A  policy  decision  about  the  cost  of  obtaining  satisfactory  per¬ 
formers  must  be  made  by  any  employer  who  builds  a  work  force.  The 
military  services  spend  large  amounts  of  money,  in  the  billions  each 
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year,  to  recruit  and  train  enlisted  personnel.  If  the  enlistment  stand¬ 
ards  and  prerequisites  for  assignment  to  skill  training  courses  are  low, 
recruiting  costs  are  relatively  low,  but  training  costs  are  high. 
Conversely,  if  qualification  standards  are  high,  recruiting  costs  are 
high  and  training  costs  low.  The  establishment  of  qualification  stand¬ 
ards  invariably  involves  the  costs  of  obtaining  the  requisite  number  of 
satisfactory  performers. 

The  policy  decision  about  an  acceptable  rate  of  unsatisfactory 
performers  we  adopted  for  purposes  of  our  analysis  is  in  terms  of  an 
acceptable  failure  rate  during  skill  training.  We  assumed  that  the 
failure  rate  should  not  exceed  10  percent.  This  value  is  a  reasonable 
average  across  all  Marine  Corps  training  courses.  Traditionally,  the 
failure  rate  in  the  Basic  Electronics  Course,  a  prerequisite  course  for 
training  in  radio  repair,  has  exceeded  10  percent.  In  FY  1980  the 
failure  rate  was  25  percent:  this  number  includes  all  reasons,  both 
academic  and  nonacademic,  such  as  physical  disability.  The  failure  rate 
in  the  Basic  Automotive  Mechanic  course  has  been  around  10  percent;  in 
FY  1980,  13  percent  of  the  input  failed  for  all  reasons.  Failure  rates 
for  Infantry  Rifleman  traditionally  have  been  less  than  10  percent.  In 
FY  1980,  it  was  5  percent,  but  none  of  the  failures  were  for  academic 
reasons.  In  FY  1980,  about  half  the  Marine  Corps  courses  had  failure 
rates  below  10  percent,  and  about  half  were  above  10  percent  [3].  An 
acceptable  failure  rate  of  10  percent  for  Marine  Corps  courses  appears 
reasonable . 

An  additional  minor  assumption  facilitates  the  computation  of 
minimum  qualifying  standards  for  each  skill.  If  we  assume  that  the 
performance  measure  scores  are  normally  distributed  and  that  they  are 
normally  distributed  in  each  aptitude  composite  score  interval,  then 
conventional  statistical  tables  can  be  used  in  the  analysis.  This 
assumption,  too,  is  reasonable. 

Validity  of  the  ASVAB 


The  validity  of  the  ASVAB  depends  on  the  criterion  measure  the 
battery  is  being  validated  against.  As  we  discussed  in  the  Introduc¬ 
tion,  the  benchmark  performance  measure  is  the  hands-on  performance 
test.  As  a  first  step  in  setting  qualification  standards,  we  need  to 
compute  standards  against  this  measure.  Hands-on  performance  tests  by 
themselves,  however,  sample  only  a  limited  portion  of  the  job  require¬ 
ments  in  a  skill.  A  more  comprehensive  criterion  measure  can  usually  be 
obtained  by  combining  the  hands-on  and  written  proficiency  tests.  The 
combination  of  hands-on  and  written  tests,  we  believe,  provides  a  better  * 

measure  of  job  performance  than  either  one  alone.  We  used  the  following 
validity  coefficients  in  our  analysis  to  compute  ASVAB  qualification 
standards.  The  coefficients  are  the  population  estimates  rounded  to  the 
nearest  .05. 
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Validity 


Hands-on 


Skill 

Hands-on 

and  written 

Ground  Radio  Repair 

.60 

.75 

Automotive  Mechanic 

.55 

.70 

Infantry  Rifleman 

.60 

.70 

Qualifying  Aptitude  Composite  Scores 

Given  these  validity  coefficients,  and  the  a  priori  values  about 
the  percent  of  the  population  that  would  be  satisfactory  performers  and 
the  acceptable  failure  rate,  the  qualifying  aptitude  composite  scores 
can  be  determined  by  table  lookup.  A  set  of  tables,  called  the 
Taylor-Russell  tables  [5],  shows  the  expected  failure  rates  for  combina¬ 
tions  of  the  values.  In  table  17,  we  present  values  taken  from  the 
Taylor-Russell  tables  that  are  relevant  to  this  study. 

The  values  in  table  17  assume  a  normal  bivariate  distribution 
between  aptitude  composites  and  performance  measures.  We  used  the  1980 
score  scale  for  the  aptitude  composite  because  the  1980  scale  more  accu¬ 
rately  reflects  the  current  population  of  potential  recruits. 

Part  A  of  table  17  should  be  used  for  determining  the  qualifying 
aptitude  composite  score  for  the  Ground  Radio  Skill  (50  percent  of  the 
population  is  satisfactory).  With  a  validity  coefficient  of  .75  for  the 
hands-on  plus  written  test,  the  expected  failure  for  an  EL  score  of  115 
(about  25  percent  of  the  population  would  be  qualified  on  EL)  is 
11  percent;  for  an  EL  score  of  120  (about  15  percent  qualified  on  EL) 
the  expected  failure  rate  is  6  percent.  With  a  validity  coefficient  of 
.60  for  the  hands-on  test  by  itself,  at  an  EL  score  of  120  the  expected 
failure  rate  is  13  percent,  which  is  well  above  our  assumed  acceptable 
rate  of  10  percent.  Use  of  the  hands-on  test  as  the  criterion  measure 
would  result  in  a  higher  qualification  standard  than  the  combined 
hands-on  plus  written  test.  For  radio  repairers,  the  relationship  among 
validity,  EL  qualification  score,  and  percent  failures  is  summarized  as 
follows : 


Percent 


Validity 

EL  score 

failures 

.75 

115 

ii 

.75 

120 

6 

.60 

120 

13 

A  reasonable  qualifying  standard  for  assigning  recruits  to  the  Ground 
Radio  Repair  skill,  using  the  combined  hands-on  plus  written  proficiency 
as  the  criterion,  is  115. 
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cr  p 


TABLE  17 


EXPECTED  FAILURE  RATES3  FOR  QUALIFYING 
APTITUDE  COMPOSITE  SCORES 

Part  A:  Percent  of  Population  Satisfactory  Performers  =  50 

(Ground  Radio  Repair) 


Validity  coefficient 


Aptitude^ 

composite 

score 

Percent0 

qualified 

.50 

.55 

ON 

o 

1 

.65 

.70 

.75 

.80 

80 

85 

.45d 

.44 

.43 

.43 

.42 

.42 

.42 

85 

75 

.42 

.41 

.40 

.38 

.37 

.36 

.36 

90 

70 

.40 

.39 

.38 

.36 

.35 

.34 

.33 

95 

60 

.37 

.35 

.34 

.32 

.30 

.28 

.27 

100 

50 

.33 

.31 

.30 

.27 

.25 

.23 

.20 

105 

40 

.30 

.28 

.25 

.23 

.20 

.18 

.15 

110 

30 

.26 

.24 

.21 

.18 

.15 

.13 

.10 

115 

25 

.24 

.22 

.19 

.16 

.13 

.11 

.08 

120 

15 

.19 

.16 

.13 

.11 

.08 

.06 

.03 

Part 

B:  Percent 

of  Population 

Satisfactory  Performers 

=  70 

(Automotive 

Mechanic) 

80 

85 

.25 

.24 

.23 

.22 

.22 

.21 

.20 

85 

75 

.22 

.21 

.20 

.19 

.18 

.17 

.16 

90 

70 

.20 

.19 

.18 

.17 

.16 

.14 

.13 

95 

60 

.18 

.17 

.15 

.14 

.12 

.11 

.09 

100 

50 

.16 

.14 

.13 

.11 

.09 

.08 

.06 

105 

40 

.13 

.12 

.10 

.08 

.07 

.05 

.03 

110 

30 

.11 

.09 

.08 

.06 

.04 

.03 

.02 

115 

25 

.10 

.08 

.07 

.05 

.04 

.03 

.02 

120 

15 

.07 

.05 

.04 

.03 

.02 

.01 

.01 

Derived  from  Taylor-Russell  tables  [5]. 

1980  score  scale. 

cAssume  normal  distribution  of  aptitude  composite  scores;  rounded  to 
nearest  5. 

^Failure  rate  shown  in  cells  of  table. 
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TABLE  17  ( Cont T  d) 


Part  C:  Percent  of  Population  Satisfactory  Performers  =  80 

(Infantry  Rifleman) 


Validity  coefficient 


Aptitude^ 

composite 

score 

Percent0 

qualified 

.50 

.55 

.60 

.65 

.70 

.75 

.80 

80 

85 

.15 

.15 

.14 

.13 

.12 

.12 

.11 

85 

75 

.13 

.12 

.11 

.10 

.09 

.09 

.08 

90 

70 

.12 

.11 

.10 

.09 

.08 

.07 

.06 

95 

60 

.10 

.09 

.08 

.07 

.06 

.05 

.04 

100 

50 

.09 

.08 

.06 

.05 

.04 

.03 

.02 

105 

40 

.07 

.06 

.05 

.04 

.03 

.02 

.01 

110 

30 

.06 

.05 

.04 

.03 

.02 

.01 

.00 

115 

25 

.05 

.04 

.03 

.03 

.02 

.01 

.00 

120 

15 

.04 

.03 

.02 

.01 

.01 

.00 

.00 

For  the  Automotive  Mechanic  skill,  part  B  of  table  17  should  be 
used  (percent  of  satisfactory  performers  is  70) ♦  With  a  validity 
coefficient  of  .70,  for  the  hands-on  plus  written  tests,  the  expected 
failure  rate  is  12  percent  at  an  MM  score  of  95  (about  60  percent  of  the 
population  would  be  qualified  on  MM)  and  9  percent  at  an  MM  score  of  100 
(50  percent  qualified  on  MM) .  With  a  validity  coefficient  of  .55,  for 
the  hands-on  test  by  itself,  the  expected  failure  rate  of  12  and 
9  percent  occur  at  MM  scores  of  105  and  110,  respectively.  For  auto¬ 
motive  mechanics,  the  relationship  among  validity,  qualification  score, 
and  percent  failures  is  summarized  as  follows: 

Percent 

Validity  MM  score  failures 


.70 

95 

12 

.70 

100 

9 

.55 

105 

12 

.55 

110 

9 

A  reasonable  qualifying  standard  for  the  automotive  mechanic  skills, 
using  the  hands-on  plus  written  tests  as  the  criterion  measure,  is  an  MM 
score  of  95 . 

For  the  Infantry  Rifleman  skill,  we  assumed  that  80  percent  of  the 
population  would  be  satisfactory  performers  (part  C  of  table  17).  With 
a  validity  coefficient  of  .70,  for  the  hands-on  plus  written  tests,  the 
expected  failure  rate  is  .12  when  the  qualifying  CO  score  is  80  (about 
80  percent  of  the  population  would  be  qualified  on  CO)  and  .09  when  the 
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qualifying  CO  score  is  85  (about  75  percent  qualified  on  CO).  With  a 
validity  coefficient  of  .60  for  the  hands-on  test  by  itself,  the 
expected  failure  rate  is  10  percent  at  a  CO  score  of  90.  For  infantry¬ 
men,  the  relationship  among  validity,  CO  Qualification  score,  and 
percent  failures  may  be  summarized  as  follows: 


Percent 


Validity 

CO  score 

failures 

.70 

80 

12 

.70 

85 

9 

.60 

90 

10 

A  reasonable  qualifying  standard  for  assignment  to  the  Infantry  Rifleman 
skill,  using  the  hands-on  plus  written  tests  as  the  criterion  measure, 
is  a  CO  score  of  85. 

The  qualifying  standards  based  on  the  combination  of  hands-on  plus 
written  proficiency  tests,  for  the  three  skills  agree  closely  with  the 
current  Marine  Corps  standards.  The  two  sets  of  standards  were  derived 
independently,  and  their  correspondence  supports  their  reasonableness. 
The  current  standards  are  based  on  the  validation  data  collected  in  1978 
and  1979.  Failure  rates  during  FY  1980  for  the  skill  training  courses 
were  used  to  help  set  the  current  Marine  Corps  qualifying  aptitude 
composite  scores  [3]. 

The  current  qualifying  EL  score  for  assignment  to  the  Ground  Radio 
Repair  skill  is  115,  which  agrees  exactly  with  our  preferred  value.  The 
current  qualifying  MM  score  for  assignment  to  the  Automotive  Mechanic 
skill  is  90  for  high  school  graduates  and  100  for  non-high  school  gradu¬ 
ates.  Our  MM  value  is  95,  the  average  of  the  current  values.  The  cur¬ 
rent  qualifying  CO  score  for  the  Infantry  Rifleman  skill  is  80  for  high 
school  graduates  and  90  for  nongraduates.  Again,  our  CO  score  of  85  is 
the  average  of  the  current  standards.  The  correspondence  between  the 
two  sets  of  standards  does  not,  of  course,  prove  that  they  are  right;  it 
only  enhances  their  plausibility. 

Comparison  of  parts  A,  B,  and  C  in  table  17  shows  that  the  expected 
failure  rates  are  sensitive  to  the  assumed  percentage  of  the  population 
that  is  satisfactory  performers.  For  difficult  skills,  with  only 
50  percent  of  the  population  satisfactory,  the  failure  rate  is  substan¬ 
tially  higher,  other  things  equal,  than  for  easier  skills,  with  70  or 
80  percent  of  the  population  satisfactory.  The  failure  rate  also 
increases  as  a  larger  percentage  of  the  population  has  qualifying  apti¬ 
tude  composite  scores,  or  as  the  qualifying  aptitude  composite  score  is 
lowered.  For  example,  with  a  validity  coefficient  of  .75  for  the  Ground 
Radio  Repair  course  (for  which  50  percent  of  the  population  would  be 
satisfactory  performers)  and  a  qualifying  EL  score  of  90,  the  expected 
failure  rate  is  .34;  the  failure  rate  is  only  .11  with  a  qualifying  EL 
score  of  115.  The  effect  of  increased  validity  on  the  failure  rate  is 


-52- 


much  smaller.  As  a  rule,  increasing  the  validity  by  .05  lowers  the 
failure  rate  by  a  maximum  of  3  percentage  points  (table  17),  other 
things  equal.  Training  schools  have  long  known  that  the  best  way  to 
reduce  failure  rates  is  to  raise  entrance  standards.  In  fact,  tradi¬ 
tionally  training  schools  have  argued  for  higher  standards  to  reduce 
their  failure  rates. 

The  Taylor-Russell  tables  can  be  used  to  determine  the  effects  of 
various  qualifying  standards  on  failure  rates.  If  the  difficulty  of  a 
skill  is  assumed  to  have  a  different  value,  then  the  expected  failure 
rate  for  a  given  qualifying  aptitude  composite  score  will  also  change. 

Some  Issues  in  Setting  Qualifying  Standards 


The  setting  of  qualifying  standards  is  a  complex  process  that 
requires  input  from  several  disciplines,  but  in  the  final  analysis  it  is 
a  matter  of  expert  judgment.  The  fundamental  requirement  is  that  the 
selection  and  classification  tests  and  other  selection  standards  have 
predictive  validity.  In  so  far  as  feasible,  the  desired  outcome  is  that 
people  who  meet  the  qualifying  standards  become  satisfactory  performers 
(called  "true  positives")  while  those  who  fail  to  qualify  would  be 
unsatisfactory  performers  (called  "true  negatives").  This  outcome  is  a 
direct  function  of  the  validity  of  the  instruments  used  to  set  quali¬ 
fying  standards — the  higher  the  validity,  the  more  accurate  the 
predictions . 

No  qualifying  standards  are  perfectly  valid,  and  the  cost  of  mis- 
classifying  people  is  an  important  issue  in  setting  standards.  One  cost 
that  was  considered  explicitly  in  our  analyses  is  that  of  accepting 
recruits  who  become  unsatisfactory  performers.  These  people  are  some¬ 
times  called  "false  positives."  The  services  traditionally  have 
attempted  to  minimize  this  cost  by  controlling  the  failure  rate  in  skill 
training  courses.  A  cost  that  remains  hidden  is  that  of  excluding 
people  who  fail  on  the  qualifying  standards  but  who  would  become  satis¬ 
factory  performers  if  they  were  enlisted.  They  are  sometimes  called 
"false  negatives."  Because  the  false  negatives  are  not  allowed  to 
enlist,  their  potential  contribution  cannot  be  realized.  The  percentage 
of  false  positives  and  false  negatives  is  related  to  the  validity  of  the 
qualifying  standards — the  higher  the  validity,  the  smaller  the 
percentage. 

Personnel  psychologists  in  the  military  traditionally  have  been 
involved  in  developing  and  validating  selection  and  classification 
instruments.  They  traditionally  have  used  two  procedures  as  input  to 
setting  enlistment  standards  that  are  affected  in  opposite  ways  by  the 
validity  of  the  predictors.  The  simplest  procedure  is  to  compute  the 
predictor  score  that  corresponds  to  the  minimum  satisfactory  performance 
score;  minimum  qualifying  standards  in  these  procedures  are  a  direct 
function  of  the  regression  line  relating  performance  and  predictor.  The 
general  outcome  with  this  procedure  is  that  the  lower  the  validity,  the 
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lower  the  qualifying  standards*  If  a  predictor  has  low  validity,  then 
people  with  low  scores  perform  about  as  well  as  people  with  high  scores, 
and  there  is  no  justification  for  high  standards* 


The  second  procedure  is 
the  cost  of  accepting  false 
ratio  of  satisfactory  (true 
positives)  performers  among 
Ordinarily  the  ratio  is  set 
which  considers  some  costs, 
opposite  of  those  using  the 
when  comparing  the  hands-on 
the  combination  of  hands-on 
in  lower  standards. 


the  one  we  employed  here,  which  considers 
positives.  This  procedure  involves  the 
positives)  and  unsatisfactory  (false 
those  who  meet  the  qualifying  standards, 
by  personnel  managers.  With  this  procedure, 
the  effects  of  validity  on  standards  are 
simple  regression  procedure.  As  we  found 
test  by  itself,  which  had  lower  validity,  to 
plus  written  test,  higher  validity  resulted 


The  traditional  procedures  either  ignored  costs  (the  regression 
approach)  or  used  only  rudimentary  costs  (failure  rates).  During  the 
draft  environment,  when  procuring  people  was  relatively  easy,  recruiting 
costs  could  be  largely  ignored.  In  the  all-volunteer  environment,  where 
the  services  must  compete  with  civilian  employers  and  academic  institu¬ 
tions,  procurement  costs  are  substantial.  Another  complicating  factor 
in  setting  standards  arises  from  equal  employment  opportunities.  The 
question  of  false  negatives  assumes  greater  significance  for  racial/ 
ethnic  minorities  when  setting  qualifying  standards.  The  validity  of 
the  standards  is  still  the  fundamental  issue,  but  issues  of  cost  and 
even  social  policy  need  more  systematic  consideration. 

In  addition  to  personnel  psychologists,  economists  can  perform  an 
essential  role  by  collecting  cost  data  on  recruiting  people.  Operations 
research  analytical  techniques  to  model  various  combinations  of  costs 
and  enlistment  standards  are  required  to  simulate  the  complex  interac¬ 
tions.  Brogden  [6]  has  developed  the  theoretical  solution  to  evaluating 
the  classification  efficiency  of  a  test  battery.  The  function  maximized 
in  his  solution  is  predicted  performance  of  people  assigned  to  the  vari¬ 
ous  types  of  skills .  The  validity  of  the  predictors  and  the  intercorre¬ 
lation  of  the  predicted  performance  scores  are  the  dominant  factors  that 
determine  classification  efficiency.  When  setting  and  validating  quali¬ 
fying  standards  the  function  to  be  maximized  is  still  predicted  perfor¬ 
mance.  Other  factors,  however,  such  as  cost,  attrition,  and  perhaps 
social  policy,  also  need  to  be  considered  when  evaluating  the  effects 
and  feasibility  of  alternative  standards  [7] . 

PERCENT  SATISFACTORY  IN  EACH  ASVAB  SCORE  INTERVAL 

The  ASVAB  score  scale  traditionally  has  been  divided  into  inter¬ 
vals.  Even  though  the  ASVAB  score  scale  is  continuous,  personnel 
managers  often  treat  persons  with  ASVAB  scores  in  the  same  interval 
similarly  and  those  in  adjoining  intervals  differently.  Of  particular 
importance  to  DoD  personnel  managers  are  the  AFQT  categories .  The  AFQT 
score  scale  is  divided  into  five  intervals  or  categories: 
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•  I  and  II,  percentile  scores  65  through  99 — most  commis¬ 
sioned  officers  and  many  senior  noncommissioned  officers 
score  in  these  intervals. 

•  III,  percentile  scores  31  through  64 — Qualification 
standards  on  aptitude  composites  for  assignment  to  skill 
training  typically  correspond  to  this  interval;  enlistment 
bonuses  typically  are  restricted  to  recruits  who  have  AFQT 
scores  of  50  or  above  (category  IIIA) . 

•  IV,  percentile  scores  10  through  30 — AFQT  enlistment 
standards  usually  are  set  in  this  interval,  especially 
during  times  of  mobilization. 

•  V,  percentile  scores  1  through  9 — since  World  War  II 
people  in  this  interval  are  unqualified  to  serve  in  the 
military  services . 

Qualifying  aptitude  composite  scores  usually  have  been  set  at  90,  100, 
or  110,  where  the  population  mean  is  100  and  standard  deviation  is  20. 
Since  1980,  some  qualifying  aptitude  composite  scores  have  been  set  in 
intervals  of  5,  such  as  85,  90,  95.  For  administrative  convenience,  the 
score  intervals  are  used  in  personnel  decisions  rather  than  the  smaller 
1  or  2  point  intervals  in  which  the  scores  are  computed. 

In  this  subsection  we  compute  the  percent  of  satisfactory  per¬ 
formers  in  10  point  intervals  of  aptitude  composite  scores  and  in  the 
AFQT  categories.  The  statistical  computations  are  relatively  complex 
because  no  convenient  tables,  similar  to  the  Taylor-Russell  tables,  are 
available  for  computing  the  percentages.  We  describe  the  procedures  for 
computing  the  performance  score  that  demarcates  satisfactory- 
unsatisfactory  performance  in  some  detail.  Readers  who  prefer  to  make 
different  assumptions  about  the  percent  of  the  population  that  would  be 
satisfactory  performers  then  can  compute  different  minimum  satisfactory 
performance  scores,  which  would  change  the  percentage  satisfactory  in 
each  ASVAB  score  interval. 

Computing  the  Performance  Score  that  Demarcates  Satisfactory- 

Unsatisfactory  Performance 


First,  we  need  to  assume  the  percentage  of  the  population  that  is 
satisfactory  performers.  We  make  the  same  assumptions  as  previously: 

•  Ground  Radio  Repair  skill — 50  percent 

•  Automotive  Mechanic  skill — 70  percent 

•  Infantry  Rifleman  skill — 80  percent. 
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Next,  we  need  to  estimate  the  standard  deviations  of  the  performance 
measures  in  the  population  of  potential  recruits.  The  estimated 
standard  deviations  are  obtained  from  the  corrections  of  the  validity 
coefficients  (reported  in  appendix  C) .  The  estimated  population  stan¬ 
dard  deviations  are  16.40  for  the  Ground  Radio  Repair  skill  (table  18), 
11.33  for  the  Automotive  Mechanic  skill  (table  19),  and  11.69  for  the 
Infantry  Rifleman  skill  (table  20). 

The  estimated  population  mean  of  the  performance  measures  is  the 
predicted  value  that  corresponds  to  the  aptitude  composite  score  of  100, 
the  population  mean  on  the  ASVAB.  The  estimated  population  mean  is  42 
for  radio  repairers  (table  18,  part  B) ,  50  for  mechanics  (table  19, 
part  B) ,  and  52  for  rifleman  (table  20,  part  B) . 

To  find  the  performance  score  that  demarcates  satisfactory  and 
unsatisfactory  performance,  compute  the  performance  measure  score  that 
corresponds  to  the  percent  of  population  that  is  satisfactory  per¬ 
formers.  We  assume  that  the  performance  measures  are  normally  distrib¬ 
uted  in  the  population.  For  radio  repairers,  where  50  percent  of  the 
population  is  assumed  to  be  satisfactory,  the  minimum  satisfactory 
performance  score  is  42.  In  a  normal  distribution,  the  mean  corresponds 
to  a  percentile  score  of  50,  and,  hence,  the  estimated  population  mean 
(42)  is  the  minimum  satisfactory  performance  score. 

For  mechanics,  where  70  percent  of  the  population  is  assumed  to  be 
satisfactory,  the  mimimum  satisfactory  performance  score  is  44.  In  a 
normal  distribution,  about  70  percent  of  the  population  lies  above  the 
point  one-half  standard  deviation  below  the  mean.  The  satisfactory- 
unsatisfactory  point,  therefore,  is  one-half  standard  deviation  below 
the  mean.  The  estimated  population  standard  deviation  of  the  perfor¬ 
mance  measure  is  11.33  (table  19)  and  one-half  rounds  to  6,  which  is 
subtracted  from  the  estimated  population  mean  of  50. 

For  riflemen,  where  80  percent  of  the  population  is  assumed  satis¬ 
factory,  the  minimum  satisfactory  performance  score  is  42.  About 
80  percent  of  the  population  lies  above  the  point  .85  standard  deviation 
below  the  population  mean.  The  estimated  population  mean  is  52  and  the 
estimated  population  standard  deviation  is  11.69  (table  20).  The 
minimum  satisfactory  performance  score  for  riflemen,  therefore,  is 
42  (minimum  =  52  -  11.69  x  .85). 

Computing  the  Percent  Satisfactory  in  Each  Score  Interval 


The  percent  satisfactory  in  each  ASVAB  score  interval  is  the  por¬ 
tion  that  falls  above  the  satisfactory  performance  score.  To  calculate 
the  percentage,  we  need  to  compute  the  distance,  in  standard  deviation 
units,  between  the  regression  line  and  the  satisfactory  score.  For 
convenience,  we  use  the  midpoint  of  the  interval,  and  we  assume  that  the 
performance  scores  are  normally  distributed  in  each  interval.  In  each 
interval,  50  percent  is  above  the  regression  line  at  the  midpoint,  and 
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REGRESSION  OF  PERFORMANCE  ON  ASVAB — GROUND  RADIO  REPAIR' 
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REGRESSION  OF  PERFORMANCE  ON  ASVAB — AUTOMOTIVE  MECHANIC 
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REGRESSION  OF  PERFORMANCE  ON  AS  VAB— INFANTRY  RIFLEMAN' 
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50  percent  below.  The  standard  deviation  of  performance  scores  in  each 
interval  is  the  standard  error  of  estimate,  assuming  homoscedasticity  in 
the  regression  of  performance  on  the  ASVAB.  The  standard  errors  of 
estimate  are  shown  in  tables  18,  19,  and  20.  The  ones  we  use  in  these 
computations  are:  8.99  for  the  Ground  Radio  Repair  skill,  7.99  for  the 
Automotive  Mechanic  skill,  and  8.59  for  the  Infantry  Rifleman  skill. 

To  obtain  the  distance  in  standard  deviation  units  at  the  midpoint 
of  each  interval,  we  computed  the  difference  between  the  predicted  per¬ 
formance  score  and  the  satisfactory  score,  and  divided  the  distance  by 
the  standard  error  of  estimate.  The  computation  of  the  percent  satis¬ 
factory  in  each  ASVAB  score  interval  is  illustrated  in  figure  1*  We 
diagrammed  the  percentage  above  satisfactory  in  the  intervals  90-99 
(midpoint  is  95)  and  110-119  (midpoint  is  115).  The  predicted  perfor¬ 
mance  score  for  an  EL  score  of  95  is  40,  which  is  2  points,  or  .22 
(2/8.99)  standard  errors  of  estimate,  below  the  satisfactory  score.  In 
a  normal  distribution,  about  9  percent  lies  between  the  mean  and  the 
point  .22  standard  deviations  away  from  the  mean.  This  9  percent  is 
added  to  the  50  percent  below  the  regression  line.  Thus,  59  percent  in 
the  interval  90-99  has  unsatisfactory  performance  scores  and  41  percent 
has  satisfactory  scores.  In  the  interval  110-119,  78  percent  is  satis¬ 
factory.  In  this  case,  the  regression  line  is  above  the  satisfactory 
score  and  the  28  percent  (.77  standard  errors  of  estimate  above  the 
satisfactory  score)  is  added  to  50  percent.  Computations  for  the  other 
intervals  are  made  in  the  same  way. 

The  percent  satisfactory  in  each  aptitude  composite  interval  is 

shown  in  figures  2,  3,  and  4  for  the  Ground  Radio  Repair  skill,  Automo¬ 

tive  Mechanic  skill,  and  Infantry  Rifleman  skill,  respectively.  In 
figures  5,  6,  and  7,  we  show  the  percent  satisfactory  for  each  AFQT 
score  interval.  The  AFQT  intervals  we  used  are:* 

Category  IVA,  percentile  scores  21-30;  midpoint  is  25 

Category  IIIB,  percentile  scores  31-49;  midpoint  is  40 

Category  IIIA,  percentile  scores  50-64;  midpoint  is  58 

Category  II,  percentile  scores  65-92;  midpoint  is  78 

Category  I,  percentile  scores  93-99;  midpoint  is  96. 

These  AFQT  score  intervals  are  commonly  used  in  personnel  decisions. 


*  No  percentages  are  shown  in  categories  IVA  and  IIIB  for  the  Ground 
Radio  Repair  skill  because  of  the  large  uncertainties  in  estimating 
values  at  this  distance  from  the  sample  mean. 
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FIG.  1:  REGRESSION  OF  PERFORMANCE  MEASURES  ON  ELECTRONICS  REPAIR 
APTITUDE  COMPOSITE  -  RADIO  REPAIR  SPECIALTY 
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FIG.  2:  PERCENT  SATISFACTORY  PERFORMERS  BY  EL  APTITUDE 
COMPOSITE  -  GROUND  RADIO  REPAIR 
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FIG.  3:  PERCENT  SATISFACTORY  PERFORMERS  BY  MM  APTITUDE 
COMPOSITE  -  AUTOMOTIVE  MECHANIC 
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ISFACTORY  PERFORMERS  BY  CO  APTITUDE 
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5:  PERCENT  SATISFACTORY  PERFORMERS  BY  AFQT 
CATEGORY  -  GROUND  RADIO  REPAIR 
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FIG.  7:  PERCENT  SATISFACTORY  PERFORMERS  BY  AFQT 
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The  results  clearly  show  that  the  percent  of  satisfactory  per¬ 
formers  increases  as  the  ASVAB  scores  increase .  These  percentages 
reflect  the  relatively  high  validity  coefficients  of  the  ASVAB  for 
predicting  performance  on  the  proficiency  tests. 

SUMMARY 

In  this  chapter  we  computed  a  set  of  qualifying  ASVAB  scores  that 
were  validated  against  measures  of  job  proficiency.  These  standards 
agree  closely  with  the  current  standards  the  Marine  Corps  uses  to  assign 
recruits  to  these  skills.  The  results  show  that  enlistment  standards 
and  qualifying  standards  for  assigning  recruits  to  skills  can  be 
validated  against  job  performance. 

A  more  thorough  validation  of  standards  can  be  accomplished  by 
including  more  complex  cost  figures  and  performance  scores.  The  cost 
figures  could  include  costs  of  recruiting  people  to  enlist  in  the  Marine 
Corps  and  costs  of  attempting  to  train  recruits  who  prove  to  be  unsatis¬ 
factory  performers.  The  performance  scores  could  include  duration  of 
satisfactory  performance,  where  benefit  to  the  Marine  Corps  increases 
with  length  of  satisfactory  performance.  A  more  thorough  validation  of 
standards  is  planned  in  a  follow-on  research  effort. 


CHAPTER  4 


DISCUSSION  AND  CONCLUSIONS 


In  this  report  we  have  addressed  two  basic  questions  about  perfor¬ 
mance  measures:  What  characteristics  do  performance  measures  have?  How 
do  we  measure  their  quality?  To  answer  these  questions  we  relied 
heavily  on  statistical  analysis  of  scores  that  represent  performance. 
From  a  measurement  point  of  view,  the  analysis  made  sense.  We  concluded 
that,  in  general,  we  had  three  satisfactory  measures  of  performance. 

Evaluation  of  job  performance,  however,  is  more  than  simply  making 
observations  on  workers  and  converting  them  to  numbers .  As  discussed  in 
the  opening  chapter,  the  fundamental  consideration  in  measuring  perfor¬ 
mance  is  content  validity.  Content  validity,  in  contrast  to  predictive 
validity,  is  not  determined  simply  by  computing  the  correlation  between 
two  sets  of  numbers.  Content  validity  involves  deciding  what  a  job  is, 
defining  procedures  for  identifying  content  of  the  measures  and  for 
observing  behavior,  and  then  converting  the  observations  to  numbers  for 
completing  the  analysis.  If  the  observations  have  a  poor  content  foun¬ 
dation,  the  analysis,  of  course,  cannot  provide  much  meaningful  infor¬ 
mation.  In  this  chapter  we  discuss  some  philosophical  and  procedural 
issues  for  evaluating  job  performance. 

Our  focus  is  on  constructing  hands-on  and  written  proficiency 
tests.  These  types  of  tests  are  constructed  specifically  as  instruments 
to  evaluate  job  performance.  Training  grades  are  routinely  obtained  by 
the  Marine  Corps  and  used  for  making  personnel  decisions;  therefore, 
other  considerations  in  addition  to  those  discussed  in  this  chapter 
apply  to  their  development.  Job  proficiency  tests  used  for  research 
purposes  ordinarily  are  subject  to  more  rigorous  development  and  admin¬ 
istrative  procedures  than  are  measures  used  for  assigning  grades  in 
skill  training  courses.  In  general,  researchers  are  able  to  exercise 
more  control  over  the  quality  of  the  tests  they  can  design  from  their 
inception,  than  over  the  quality  of  training  grades  that  are  provided  by 
the  personnel  system. 

Measurement  of  job  performance  is  done  for  a  variety  of  purposes . 
Three  common  purposes  are  to  identify  training  deficiencies;  to  help  in 
personnel  decisions,  such  as  promotion  and  retention;  and  to  use  as 
criterion  measures  for  validating  selection  and  classification  tests. 

The  proficiency  tests  used  in  this  study  were  developed  especially  for 
the  last  purpose.  Although  they  may  have  usefulness  for  other  purposes, 
we  have  considered  them  only  as  criterion  measures  for  validating  the 
ASVAB . 


In  this  chapter  we  discuss  some  issues  surrounding  the  use  of  job 
performance  tests  as  criterion  measures  for  validating  qualification 
standards.  In  chapter  1,  we  considered  the  fundamental  issue  of  content 
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validity — how  should  a  job  in  the  military  environment  be  defined  and 
who  should  define  job  requirements?  Job  experts  are  key  to  these 
decisions,  but  their  work  too  should  be  reviewed  and  approved  from  a 
policy  point  of  view.  In  this  chapter  we  address  the  issue  of  scoring 
the  measures.  The  question  of  scoring  is  one  of. defining  and  main¬ 
taining  standards.  The  standards  issue  starts  at  the  smallest  unit  of 
observable  behavior  (answering  an  item  or  completing  a  step  of  a  task) 
and  extends  to  satisfactory-unsatisfactory  levels  of  performance  on  the 
job.  Expert  judgment  is  required  at  each  level  of  scoring  (step,  task, 
job) .  Statistical  analysis  can  guide  the  judgments,  but  not  make  them. 
We  discuss  the  costs  of  developing,  administering,  and  analyzing  job 
performance  tests,  and  then  we  close  the  discussion  section  by  criti¬ 
quing  the  performance  tests  used  in  this  study.  Our  conclusions  con¬ 
sider  both  the  feasibility  and  cost  of  using  job  performance  measures  to 
validate  qualification  standards. 

SCORING  PROFICIENCY  TESTS 

Scoring  the  tests  means  that  observations  of  performance  are  con¬ 
verted  to  numbers.  The  observation  may  be  of  a  mark  on  an  answer  sheet, 
for  written  tests,  or  of  an  action  taken  by  the  examinee  when  completing 
a  step  in  a  hands-on  performance  test.  The  numbers  then  must  be 
assigned  meaning  about  level  of  performance,  which  requires  that  we  con¬ 
struct  a  score  scale.  The  examinees’  scores  are  interpreted  relative  to 
the  scale.  Because  we  want  to  evaluate  examinees’  scores  according  to 
some  reference  points,  the  scores  should  be  obtained  under  standard 
conditions  and  they  should  be  reasonably  accurate.  In  this  subsection, 
we  discuss  problems  with  satisfying  these  scoring  requirements. 

Converting  Observations  to  Numbers 


The  general  principle  for  converting  observations  to  numbers  is  to 
decompose  job  requirements  into  small  units  that  can  readily  be  scored 
as  pass-fail  or  correct-incorrect.  For  paper-and-pencil  tests,  this 
means  a  test  item;  for  hands-on  performance  tests,  this  usually  means  a 
step  in  performing  a  task.  For  written  multiple-choice  tests,  deciding 
on  the  correct  answer  is  relatively  easy.  A  panel  of  experts  can  review 
the  items  and  agree  on  the  right  answer.  Absence  of  agreement  indicates 
a  faulty  item.  Because  of  the  scoring  accuracy  and  administrative  con¬ 
venience,  paper-and-pencil  tests  have  enjoyed  great  popularity,  even  as 
measures  of  job  proficiency. 

For  hands-on  tests,  the  scoring  rules  are  more  difficult  to  estab¬ 
lish  and  follow.  Ordinarily,  a  task  involves  a  continuous  flow  of 
behavior.  For  measurement  purposes,  the  flow  is  segmented  into  observ¬ 
able  steps,  and  then  standards  are  established  about  passing  or  failing 
each  step.  Test  administrators  apply  the  scoring  rules  to  the  behavior 
of  examinees  as  the  examinees  attempt  to  perform  the  tasks.  Ideally, 
all  administrators  employ  the  same  standards  when  scoring  each  step,  and 
they  provide  identical  testing  situations,  such  as  using  the  same  verbal 
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and  body  language.  Later  we  shall  return  to  problems  of  standardizing 
testing  conditions. 

Attaching  Meaning  to  the  Scores 


In  all  tests,  the  meaning  of  the  scores  must  be  established.  The 
meaning  of  the  scores  is  always  relative  to  a  set  of  standards.  The 
pass-fail  or  correct-incorrect  score,  usually  scored  as  1  or  0,  is 
relative  to  the  agreed  upon  correct  answer.  The  Is  and  Os  are  aggre¬ 
gated  to  represent  performance,  and  the  total  score  is  then  placed  on  a 
scale.  In  chapter  3,  we  spent  some  time  devising  a  satisfactory- 
unsatisfactory  scale  for  the  proficiency  tests.  In  chapter  2,  we  used  a 
continuous  scale,  with  no  explanations  or  apologies  for  what  we  were 
doing  (except  for  the  need  to  standardize  differences  between  test 
administrators).  Now  we  need  to  examine  more  carefully  how  we  con¬ 
structed  the  score  scales. 

In  chapter  2,  we  assumed  that  equal  differences  between  scores  had 
the  same  meaning,  in  terms  of  performance,  throughout  the  scale;  that 
is,  a  difference  of,  say,  two  points  at  the  low  end  of  the  scale  was 
equal  to  a  difference  of  two  points  in  the  middle  or  high  end  of  the 
scale.  Without  this  assumption,  we  could  not  interpret  the  statistics 
we  used  (mean,  standard  deviation,  and  correlation  coefficient).  This 
assumption  is  reasonable,  except  perhaps  for  the  piling  up  of  scores  at 
the  high  end  of  the  scale  for  the  radio  repairers  and  mechanics  hands-on 
tests.  The  piling-up  of  scores  usually  means  that  the  score  scale  is 
compressed,  and  the  true  differences  in  performance  are  larger  than 
those  observed.  The  effect  of  the  compression  is  to  reduce  observed 
differences  among  the  examinees  and  weaken  the  statistical  relation¬ 
ships.  The  credibility  of  the  measures  was  supported  in  spite  of  the 
compression,  and  our  assumption  about  the  meaning  of  the  scores  appears 
reasonable . 

An  assumption  we  did  not  make  is  that  we  could  identify  a  score 
that  meant  zero  performance.  Zero  performance  would  mean  that  the  per¬ 
son  cannot  meet  any  of  the  job  requirements.  In  simple  domains,  we  can 
reasonably  establish  minimum  levels  of  competency.  Spelling  of  one- 
syllable  words  and  addition  of  two-digit  numbers  are  examples  of  where  a 
reasonable  zero  score  can  be  established.  Once  we  move  to  more  complex 
domains,  such  as  job  performance,  especially  with  adult  examinees,  then 
zero  levels  of  competence  or  performance  are  arbitrary. 

Setting  Standards  on  a  Performance  Test 


The  argument  about  a  zero  point  has  implications  for  the  meaning  of 
the  score  scale.  In  recent  years,  a  movement  has  grown  to  use  "criterion- 
referenced"  standards  to  evaluate  performance.  With  criterion-referenced 
standards,  an  a  priori  passing  score  is  established  on  the  measure. 
Examinees  who  meet  the  passing  score  are  said  to  be  satisfactory,  or 
competent,  or  to  have  mastered  the  domain.  The  number  of  examinees 
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who  attain  a  passing  score  is  irrelevant  to  the  setting  of  criterion- 
referenced  standards.  Note  that  the  written  test  items  or  steps  of  the 
hands-on  tests  use  criterion-referenced  standards  in  the  sense  that  the 
pass-fail  score  is  a  priori,  regardless  of  its  difficulty.  Complica¬ 
tions  arise,  however,  when  we  attempt  to  characterize  performance  in  a 
complex  skill  as  satisfactory  or  unsatisfactory.  Aggregating  a  series  of 
Is  and  Os  and  then  deciding  on  a  passing  score  for  the  number  of  Is 
inherently  involves  arbitrary  decisions. 

The  arbitrary  nature  of  standards  is  illustrated  by  the  different 
standards  used  by  the  test  administrators  in  this  study  when  they  were 
nominally  observing  the  same  performance.  In  the  hands-on  testing, 
conditions  were  reasonably  standardized  in  contrast  to  realistic  job 
conditions,  and  even  so  administrators  employed  different  standards. 
Also,  standards  of  satisfactory  performance  can  be  varied  depending  on 
personnel  supply.  When  competent  people  are  plentiful,  personnel 
managers  and  unit  commanders  raise  standards.  When  the  need  for  per¬ 
sonnel  is  great  and  the  supply  is  limited,  as  during  mobilization,  then 
standards  tend  to  be  lowered.  Workers  can  compensate  for  personal  weak¬ 
nesses,  and  supervisors  can  restructure  job  requirements  to  capitalize 
on  the  strengths  of  the  workers  assigned  to  them.  Although  there  is 
intuitive  appeal  to  a  true  zero  point  of  satisfactory  performance,  in 
practice,  minimum  satisfactory  performance  can  be  modified  with  changed 
conditions . 

The  score  scale  we  used  is  called  "norm-referenced ."  In  norm- 
referenced  scales,  the  meaning  of  the  scores  is  determined  by  the 
relative  performance  of  examinees  on  the  measure .  We  compare  scores 
relative  to  the  other  scores  in  the  distribution.  We  use  the  mean  as 
the  zero  point,  and  assign  meaning  to  scores  based  on  their  distance 
away  from  the  mean.*  If  a  test  is  easy,  then  the  mean  score  is  high 
compared  to  a  difficult  test,  but  relative  standing  of  the  examinees 
remains  unchanged.  With  a  criterion-referenced  scale,  the  difficulty  of 
the  test  is  crucial  to  the  standing  (satisfactory  or  unsatisfactory)  of 
the  examinees . 

Given  the  types  of  performance  measures  used  in  this  study,  we  are 
willing  to  assume  a  norm-referenced  scale,  and  then  derive  standards 
from  that  type  of  scale.  We  are  unwilling  to  assume  an  absolute 
dichotomy  between  satisfactory  and  unsatisfactory  performance  that  is 


*  Percentile  scores,  which  show  rank  order  in  a  distribution,  are  also 
norm  referenced,  and  they  provide  essentially  the  same  information  as 
distance  from  the  mean. 


based  on  realistic  job  requirements.  We  plan  to  continue  with  norm' 
referenced  scales  in  future  research  efforts  unless  new  evidence  emerges 
that  criterion-referenced  standards  are  meaningful  for  military  skills. 

Proficiency  Versus  Productivity 


The  preceding  discussion  about  the  meaning  of  the  test  scores 
emphasized  their  arbitrary  nature.  The  score  scale  for  the  proficiency 
tests,  as  for  training  grades,  is  an  abstraction  that  cannot  be  trans¬ 
lated  into  units  of  production.  From  these  scores,  we  cannot  tell  how 
many  radios  a  repairer  can  fix  each  day,  or  how  many  jeeps  a  mechanic 
can  tune  up,  or  how  many  enemy  troops  a  rifleman  can  render  harmless. 

The  scores  only  tell  us  which  examinees  are  better  than  others . 

The  test  scores  do  not  permit  inferences  about  tradeoffs  between 
number  of  workers  and  performance.  We  cannot  say,  for  example,  that  two 
people  with  a  score  of  40  are  equal  to  one  person  with  a  score  of  60  or 
80,  or  for  that  matter,  any  combination  of  people  and  scores.  The  score 
scale  is  too  weak  to  permit  extrapolations  into  how  much  work  people  can 
produce  in  the  normal  job  environment. 

If  the  score  scale  is  that  weak,  how  do  we  know  that  it  is  mea¬ 
suring  anything  of  value?  The  best  evidence  we  have  is  from  the  proce¬ 
dures  used  to  construct  the  tests.  Job  experts  said  that  the  content 
reflects  job  requirements.  If  their  judgments  are  wrong,  then  the  tests 
have  no  content  validity.  Even  if  their  judgments  are  right,  we  still 
must  proceed  by  assumption.  We  cannot  set  up  an  experiment  to  demon¬ 
strate  that  the  score  scale  accurately  reflects  performance  on  job 
requirements.  The  best  evidence  is  agreement  among  job  experts,  sup¬ 
ported  by  statistical  analysis.  We  can  build  a  plausible  argument  that 
the  scores  provide  meaningful  information.  Strictly  speaking,  the 
meaning  is  limited  to  inferences  about  relative  performance  in  a  testing 
environment.  We  assume  that  we  can  generalize  from  test  scores  to  per¬ 
formance  on  the  job,  but  we  cannot  build  a  confidence  interval.  Neither 
can  we  tell  supervisors  or  managers  how  to  convert  the  scores  into  units 
of  production  on  the  job.  The  tests  were  designed  as  criterion  measures 
for  validating  the  ASVAB,  and  the  score  scale  does  permit  such  a  use. 

Standard  Testing  Conditions 

Because  the  reason  for  placing  test  scores  on  a  scale  is  to  attach 
meaning  to  them,  scores  that  have  the  same  value  should  reflect  the  same 
level  of  performance.  The  best  way  to  ensure  equal  meaning  of  the 
scores  is  to  use  standard  testing  conditions.  If  all  examinees  perform 
the  same  tasks,  and  the  same  scoring  rules  are  applied  equally  to  all 
examinees,  then  the  same  scores  tend  to  have  the  same  meaning. 

In  chapter  1,  we  discussed  the  content  of  the  tests,  and  one  possi¬ 
bility  is  to  construct  a  different  test  for  each  examinee  to  cover 
unique  job  requirements.  Such  a  procedure  violates  standard  testing 
conditions,  and  special  procedures  are  required  to  put  all  scores  on  the 


-73- 


same  scale.  The  tests  would  need  to  be  equated  or  calibrated  before  the 
scores  can  be  compared. 

The  experience  of  the  military  services  in  equating  or  calibrating 
different  versions  of  the  ASVAB  is  instructive  for  putting  the  different 
performance  tests  on  the  same  score  scale.  In  the  testing  jargon,  we 
speak  of  equating  tests  if  they  have  parallel  content;  if  they  have 
somewhat  different  content,  then  we  speak  of  calibrating  them.  The 
equating  of  tests  is  invariant;  the  equality  of  the  scores  applies  to 
all  possible  samples  of  examinees.  Calibration,  however,  is  sample 
unique.  In  calibration,  different  scores  can  be  set  equal  to  each  other 
depending  on  the  characteristics  of  the  sample  used  to  calibrate  the 
tests.  If  the  different  performance  tests  are  parallel,  which  means 
they  are  measuring  the  same  thing  except  for  trivial  differences  in  the 
content,  then  putting  them  on  the  same  scale  is  relatively  easy.  If 
they  are  measuring  the  same  thing,  however,  there  is  no  need  to  develop 
a  different  test  for  each  duty  assignment.  The  different  tests,  there¬ 
fore,  must  be  measuring  somewhat  different  things,  which  means  we  cannot 
be  certain  that  the  same  scores  mean  the  same  level  of  performance. 

When  ASVAB  tests  are  equated,  the  preferred  sample  size  is  about 
2,000  examinees.  Calibrating  two  tests  requires  even  larger  samples  to 
help  obtain  representativeness  and  permit  generalization  to  the  popula¬ 
tion  of  examinees.  Given  the  expense  of  administering  hands-on  per¬ 
formance  tests,  there  is  no  way  that  adequate  samples  can  be  obtained. 
With  the  sample  sizes  that  are  feasible,  say  up  to  100  examinees  who 
would  take  exactly  the  same  test,  the  calibration  remains  dubious. 
Comparison  of  examinees,  on  either  a  criterion-referenced  or  norm- 
referenced  scale,  tested  with  different  performance  measures,  then  could 
not  be  done  with  confidence. 

COST  OF  JOB  PERFORMANCE  TESTING 

Developing  and  administering  job  performance  tests,  especially  the 
hands-on  tests,  is  expensive  in  terms  of  money  and  people.  The  approxi¬ 
mate  costs  to  the  Marine  Corps  for  developing,  administering,  and 
analyzing  the  job  performance  tests  used  in  this  study  are  shown  in 
table  21.  The  development  costs  for  these  tests  are  an  absolute 
minimum.  Future  efforts  to  develop  job  performance  tests  would  be  more 
costly.  The  figures  are  based  on  Marine  Corps  experience  for  the  three 
tests  developed  for  this  study,  and  adjusted  to  incorporate  decisions 
about  sample  size  made  by  the  Joint  Services  Job  Performance  Measurement 
Working  Group. 

Test  Development 


The  test  development  process  requires  close  coordination  between 
job  and  testing  experts.  The  job  experts  should  be  intimately  familiar 
with  the  job  requirements,  including  experience  in  performing  the  job 
tasks  themselves  and  in  supervising  the  performance  of  others.  One  of 
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their  tasks  is  to  translate  job  requirements  into  terms  and  concepts 
commonly  used  by  workers  in  the  specialty.  The  testing  experts  should 
know  how  to  structure  job  requirements  to  enhance  test  validity  and  how 
to  exercise  quality  control. 


TABLE  21 

APPROXIMATE  COSTS  OF  A  JOB  PERFORMANCE  TEST 


Cost 


Activitity 

Days 

Dollars 

Test  Development 

Test  experts 

190 

80,000 

Job  experts 

Overhead 

Subtotal 

250 

30,000 

55,000 

165,000 

Test  administration 

Examinees 

400 

40,000 

Administrators 

Overhead 

Subtotal 

400 

50,000 

45,000 

135,000 

Analysis  and  reporting 

Analysts 

Total 

125 

60,000 

360,000 

Structuring  the  job  requirements  means  that  the  test  content 
includes  only  skills  and  knowledge  essential  to  performing  job  tasks  and 
excludes  trivial  bits  of  information.  A  key  component  in  the  develop¬ 
mental  process  is  the  task  analysis,  in  which  the  steps  required  to 
perform  a  task  are  clearly  specified.  These  steps  serve  as  the  building 
blocks  for  constructing  the  hands-on  and  written  tests.  The  job  experts 
specify  the  steps,  and  the  test  experts  help  translate  the  steps  into 
test  content.  Before  the  tests  are  administered  to  the  large  sample, 
they  should  be  tried  out  on  small  groups  to  make  sure  they  provide  valid 
measures — scoring  accuracy,  consistency  of  measurement,  and  no  com¬ 
plaints  about  unclear  directions  or  questions. 

The  time  required  to  develop  job  performance  tests  used  in  this 
study  was  about  9  months  of  testing  expert Ts  time.  At  about  $110,000 
per  professional  year,  the  cost  to  contract  for  testing  experts  is 
$80,000.  More  time  of  the  job  experts  is  required,  but  at  a  lower 
cost.  Two  job  experts,  each  working  about  6  months,  were  involved  in 
constructing  their  tests.  The  cost  for  job  experts,  who  normally  are 
noncommissioned  officers,  is  about  $30,000  per  year.  This  figure 
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includes  costs  for  pay  and  allowances  (P&A)  and  for  organization  and 
maintenance  (O&M);  other  costs  such  as  retirement  and  travel  are  not 
included.  The  overhead  cost  of  $55,000  included  managers  from  Marine 
Corps  Headquarters  (3  months),  local  support  from  the  installation  where 
the  tests  were  developed  (4  months),  CNA  analysts  (2  months),  and 
travel.  The  cost  for  developing  a  test,  including  tryout  and  evaluation 
was  $165,000. 

Test  Administration 


Administering  hands-on  performance  tests  is  expensive  because  only 
one  examinee  can  be  tested  at  a  time  by  a  test  administrator.  The  costs 
are  based  on  400  examinees  in  the  sample,  each  tested  for  a  full  day, 
and  400  days  of  test  administrator  time.  The  examinees  typically  are 
first-term  enlisted  personnel,  who  cost  about  $100  per  day  for  pay  and 
allowances  and  for  organization  and  maintenance  costs.  The  Job  Perfor¬ 
mance  Working  Group  has  decided  that  300  examinees  constitute  an  ade¬ 
quate  sample  for  validation  purposes.  To  obtain  300  usable  cases,  we 
estimate  that  400  examinees  need  to  be  scheduled. 

To  test  the  400  examinees,  400  mandays  of  test  administrator  time 
need  to  be  allocated.  Even  though  the  Marine  Corps  hands-on  test  lasted 
only  one-half  day,  a  full  day  of  administrator’s  time  was  required.  The 
extra  time  was  spent  setting  up  and  maintaining  equipment,  scheduling 
examinees,  and  taking  time  to  regroup.  The  test  administrators  cost 
about  $125  per  day,  or  $50,000  for  400  mandays.  These  costs  are  for 
military  test  administrators,  but  the  cost  for  civilians  would  be  about 
the  same . 

The  overhead  cost  of  $45,000  for  test  administration  included 
management  by  Marine  Corps  Headquarters  (2  months)  and  by  the  local 
installation  (3  months),  plus  2  months  of  an  analyst’s  time  to  help 
exercise  quality  control  over  the  way  hands-on  tests  are  administered 
and  scored. 

Analysis  and  Reporting 


Analysis  of  the  data  and  preparation  of  a  report  required  about 
one-half  of  a  professional  year.  The  figure  of  $60,000  included  the 
time  of  research  assistants,  editors,  managers,  and  analysts.  The 
analysis  cost  would  be  considerably  higher  if  the  costs  of  recruiting 
and  training  Marines  were  considered  more  systematically  than  we  did; 
such  a  comprehensive  analysis,  which  helped  provide  the  impetus  for  the 
joint-service  project,  was  conducted  by  Armor  [7]. 

The  total  cost  to  develop,  administer,  and  analyze  a  job  perfor¬ 
mance  test  was  about  $360,000.  The  cost  per  examinee  with  usable  data 
was  over  $1,000.  For  research  purposes,  where  a  limited  number  of 
specialties  are  tested,  the  expense  is  tolerable.  Should,  however,  job 
performance  testing  be  conducted  for  a  large  number  of  specialties,  the 
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costs  would  add  up  quickly.  The  Marine  Corps,  the  smallest  of  the  ser¬ 
vices,  has  over  60  specialties  with  more  than  100  new  recruits  assigned 
each  year.  These  specialties  would  provide  enough  examinees  to  permit 
validation  of  qualification  standards.  The  cost  for  one-time  testing  of 
the  60  specialties  computed  at  the  same  rate  would  be  over  $20  million 
(60  x  $360,000)  .  If  the  tests  were  recurring  or  if  they  also  covered 
second-term  personnel,  the  costs  would  be  even  higher. 

In  addition  to  monetary  and  personnel  costs,  there  are  hidden  costs 
to  the  installations  where  the  tests  are  administered.  Testing  disrupts 
the  normal  activities  of  units.  Noncommissioned  officers  are  in  short 
supply,  and  units  are  reluctant  to  release  them  for  working  on  the 
tests.  Units  are  also  reluctant  to  dedicate  expensive  equipment,  such 
as  trucks,  tanks,  or  planes,  to  support  the  testing.  Widescale  testing 
would  impose  an  onerous  burden  on  units  and  therefore  is  not  feasible. 

The  Joint  Services  Job  Performance  Measurement  Working  Group  has 
wisely  decided  that  a  major  goal  of  the  research  program  is  to  find 
valid  measures  that  can  be  used  in  lieu  of  the  hands-on  performance 
tests.  A  significant  component  of  the  research  program  is  the  evalua¬ 
tion  of  training  grades  as  a  criterion  measure  for  validating  qualifica¬ 
tion  standards.  A  desirable  outcome  of  the  research  program  is  to 
identify  the  types  of  specialties  for  which  training  grades,  or  other 
less  expensive  performance  measures,  can  serve  as  satisfactory  criteria 
to  validate  qualification  standards.  For  these  specialties  the  expense 
of  hands-on  testing  can  then  be  avoided. 

CRITIQUE  OF  THE  PROFICIENCY  TESTS  USED  IN  THIS  STUDY 

The  hands-on  and  written  proficiency  tests  used  in  this  study  gen¬ 
erally  were  satisfactory.  Copies  of  the  test  are  contained  in  [8]  •  If 
we  were  doing  the  study  again,  however,  we  would  attempt  to  have  some¬ 
what  different  tests.  The  procedures  for  determining  the  content  areas 
to  cover  with  tests  were  appropriate.  Job  experts  were  consulted  to 
ensure  that  the  critical  requirements  were  included .  When  the  require¬ 
ments  were  translated  into  observable  behavior  (written  test  items  and 
hands-on  tests) ,  we  would  have  preferred  that  the  tests  place  greater 
emphasis  on  requiring  examinees  to  apply  their  skills  and  knowledge  to 
performing  tasks.  Many  of  the  written  test  items  asked  examinees  about 
abstract  facts  and  principles;  by  presenting  work-related  problems  and 
having  the  examinees  say  what  they  would  do,  we  believe  the  content 
validity  would  have  been  enhanced . 

The  hands-on  performance  tests  reflected  three  different  design 
strategies.  The  radio  repair  test  used  a  new  piece  of  equipment  that  no 
examinee  had  worked  on  before.  They  had  to  apply  their  troubleshooting 
skills  during  the  test.  This  test  should  permit  maximum  generalization 
to  requirements  in  the  skill,  but  not  necessarily  describe  how  well 
examinees  perform  the  tasks  in  their  current  duty  assignment.  The 
mechanics  test  asked  the  examinees  to  perform  tasks  that  they  normally 
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encounter  in  their  daily  work — tuning  up  a  quarter-ton  Jeep  and  working 
on  the  wheels  and  brakes.  Both  describe  their  proficiency  in  the  cur¬ 
rent  assignments  and  should  generalize  to  the  skill. 

The  rifleman  hands-on  test  was  a  mixture  of  doing  and  knowing,  with 
some  stress  built  in.  In  the  doing  parts,  they  fired  their  weapons  at 
pop-up  targets  while  negotiating  a  firing  range  with  explosives  going 
off  around  them.  They  also  encountered  dummies  on  whom  they  were  sup¬ 
posed  to  perform  first  aid.  In  the  knowing  parts,  they  were  asked  to 
identify  hand  signals  printed  on  a  card  and  to  identify  map  symbols . 

Even  with  hands-on  tests,  tasks  can  be  presented  abstractly,  rather  than 
in  a  job  functional  context.  In  general,  the  tasks  reflect  combat 
requirements,  and  the  realism  is  a  matter  of  feasibility. 

Although  the  tests  probably  could  be  improved,  they  were  adequate 
for  this  feasibility  study.  There  are  no  certain  rules  for  developing 
proficiency  tests,  and  different  people  employ  different  strategies. 
Perhaps  as  we  gain  more  experience  in  building  proficiency  tests,  the 
researchers  and  personnel  managers  can  attain  greater  agreement  about 
what  a  good  performance  measure  should  look  like. 

SUMMARY  AND  CONCLUSIONS 

The  most  important  conclusion  is  that  it  is  feasible  to  validate 
qualification  standards  against  job  performance.  The  ASVAB  is  a  valid 
predictor  of  job  performance,  as  measured  by  hands-on  proficiency  tests, 
written  proficiency  tests,  and  grades  in  skill  training.  By  making 
reasonable  assumptions  about  the  difficulty  of  the  skills  and  acceptable 
rates  of  unsatisfactory  performers,  we  computed  a  new  set  of  qualifying 
standards  that  correspond  closely  to  the  current  ASVAB  standards  for 
assigning  recruits  to  the  three  skills  in  the  study  (Ground  Radio 
Repair,  Automotive  Mechanic,  and  Infantry  Rifleman).  We  established  the 
credibility  of  the  three  performance  measures  in  terms  of  content 
validity  and  accuracy  of  the  scores.  The  predictability  of  the  perfor¬ 
mance  measures  by  the  ASVAB  conforms  to  prior  experience,  and  the  quali¬ 
fication  standards  using  job  performance  tests  as  the  criterion  measure 
agree  closely  with  current  Marine  Corps  qualifying  standards.  Hence,  we 
reach  our  conclusion  that  qualification  standards  can  be  validated 
against  job  performance. 

The  second  main  conclusion  is  that  measuring  job  performance  is  a 
complex  and  expensive  process  that  produces  uncertain  results.  In  the 
lengthy  introductory  and  discussion  chapters,  we  raised  some  of  the 
issues,  problems,  and  pitfalls  related  to  measuring  job  performance. 

Even  in  those  long  pages,  we  skimmed  over  most  of  the  topics,  and  many 
readers  will  undoubtedly  say  that  we  omitted  some  of  the  most  important 
ones.  Our  intent  was  not  to  resolve  the  issues,  but  to  point  them  out 
and  move  toward  a  possible  resolution.  One  reason  we  dwelled  on  the 
complications  is  that  the  military  services  currently  are  embarking  on 
an  extensive  research  program  to  validate  qualification  standards 
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against  job  performance-  An  interservice  working  group  on  job  perfor¬ 
mance  measurement  was  established  in  fall  1982.  The  working  group  is 
responsible  for  formulating  and  coordinating  an  effective  program  that 
satisfies  the  needs  of  personnel  managers  and  conforms  to  good  scien¬ 
tific  practice.  Our  hope  is  that  this  preliminary  effort  will  be  useful 
to  help  formulate  an  effective  and  efficient  research  program. 

The  remaining  conclusions  will  be  presented  more  briefly: 

•  When  developing  proficiency  tests,  panels  of  job  experts 
should  review  the  procedures  and  products  throughout  the 
entire  process  to  help  ensure  content  validity. 

•  The  agency  responsible  for  developing  tests  should  also 
have  responsibility  for  scoring  them-  Complicated  test 
items  may  appear  to  have  greater  content  validity,  but 
with  extra  care  they  can  be  made  more  convenient  for  both 
examinees  and  scorers,  with  probably  little  or  no  loss  in 
validity. 

•  Inexpensive  data,  such  as  ratings,  training  grades,  and 
ASVAB  scores,  should  be  collected  before  administering  the 
expensive  hands-on  and  written  proficiency  tests.  Testing 
resources  should  not  be  wasted  on  examinees  who  must  be 
deleted  from  the  sample  because  of  missing  data. 

•  Test  administrators  should  be  trained  to  provide  uniform 
testing  conditions.  Administrators  should  be  consistent 
in  the  amount  and  type  of  help  they  give  examinees  and  in 
the  scoring  standards  they  use. 

•  Norm-referenced,  rather  than  criterion-referenced,  score 
scales  and  standards  for  satisfactory  performance  should 
be  used  for  the  type  of  measures  used  in  this  study. 

The  final  conclusion  is  that  even  though  we  can  validate  qualifica¬ 
tion  standards  against  hands-on  job  performance  tests,  we  may  not  always 
want  to.  Perhaps  in  the  technical  skills,  represented  by  Ground  Radio 
Repair  and  Automotive  Mechanic  in  this  study,  the  traditional  criterion 
measure  of  grades  in  skill  training  courses  may  be  satisfactory.  In 
nontechnical  skills,  represented  by  Infantry  Rifleman,  hands-on  and 
written  proficiency  tests  may  provide  information  about  performance  not 
available  from  other  sources.  Although  no  firm  conclusion  can  be  drawn 
until  the  usefulness  of  training  grades  as  valid  criterion  measures  of 
performance  is  documented  by  an  extensive  body  of  research  results,  they 
do  have  sufficient  promise  as  measures  of  performance  that  they  should 
be  retained  for  all  recruits  in  all  training  courses.  The  grades  should 
be  numerical  scores  as  traditionally  reported. 
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In  summary,  we  set  out  to  determine  the  feasibility  of  validating 
ASVAB  qualifying  scores  against  measures  of  job  performance.  Through 
extensive  correlational  analysis,  we  established  the  credibility  of  the 
performance  measures  and  their  predictability  by  the  ASVAB.  We  computed 
qualifying  ASVAB  scores  for  each  skill,  and  the  standards  were  reason¬ 
able.  We  then  presented  some  topics  for  consideration  when  designing 
further  research  efforts  on  validating  ASVAB  qualification  standards. 
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DEVELOPMENT  OF  THE  PROFICIENCY  TESTS 


INTRODUCTION 

The  hands-on  and  written  proficiency  tests  were  developed  through 
the  joint  efforts  of  test  experts  from  the  Navy  Personnel  Research  and 
Development  Center  (NPRDC)  and  Marine  Corps  job  experts  from  Camp 
Pendleton,  CA.  The  team  for  each  test  was  composed  of  two  job  experts 
and  one  test  expert.  The  test  development  procedures  and  content  are 
described  in  an  NPRDC  report  [A-l].  In  this  appendix,  we  extract 
salient  information  from  the  NPRDC  report  and  supplement  it  with  our 
observations  on  the  tests. 

THE  DEVELOPMENT  PROCEDURES 

The  general  approach  for  .  test  development  used  by  NPRDC  is  as 
follows : 

1.  Identify  tasks  performed  in  each  skill  and  group  them  by 
major  task  areas 

2.  Rank  the  task  areas  according  to  how  well  they  predict 
performance  in  the  skill  and  rank  the  tests  in  each  task 
area  according  to  how  well  they  predict  performance  in  the 
area 

3.  Construct  hands-on  and  written  tests  for  the  top-ranked 
tasks,  based  on  their  suitability  for  hands-on  or 
performance-oriented  written  tests,  taking  into  account 
logistical  feasibility 

4.  Conduct  field  tryouts. 

Each  test  was  developed  by  a  different  team  of  test  and  Marine  Corps  job 
experts.  The  teams  adapted  the  general  procedures  to  the  unique 
requirements  of  each  skill. 

Ground  Radio  Repair 


Eight  Marine  Corps  job  experts  reviewed  the  task  areas  for  the 
Ground  Radio  Repair  skill.  The  clear  consensus  was  that  the  most 
predictive  task  area  was  troubleshooting.  Because  most  circuits  are 
similar,  the  actual  equipment  used  for  troubleshooting  was  not 
considered  important.  To  minimize  experience  with  specific  pieces  of 
equipment,  however,  the  team  decided  to  use  the  AN/UIQ— 10  amplifier, 
which  had  not  yet  been  issued  to  the  field. 
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Examinees  could  consult  technical  manuals  and  troubleshooting 
charts  during  the  hands-on  and  written  tests.  The  hands-on  test  was 
tried  out  on  five  examinees.  The  written  test  was  tried  out  on  ten 
examinees.  The  team  was  satisfied  with  its  efforts  on  the  hands-on 
test : 


Since  the  examinee  recorded  his  troubleshooting  diagnosis  on 
an  answer  sheet,  the  examiner  was  not  required  to  make  any 
judgments  as  to  what  procedures  were  followed.  Therefore, 
scoring  was  completely  objective  and  scoring  reliability 
posed  no  problem.  The  only  problem  worthy  of  note  was  the 
time  required  to  procure,  install,  and  check  the  three 
AN/UIQ-10  amplifiers  and  all  the  instruments  and  materials 
needed  (pp.  8  and  9). 

In  the  development  of  the  written  test  "no  problems  worthy  of  note  were 
encountered. " 

Automotive  Mechanic 


Initial  efforts  to  identify  task  areas  covered  a  broader  scope  of 
job  requirements  than  those  specific  to  the  organization  level  automo¬ 
tive  mechanic.  The  first  set  of  job  experts  ranked  the  comprehensive 
set  of  requirements  that  included  vehicle  recovery,  electrical  systems, 
and  intermediate  level  of  maintenance,  as  well  as  organizational 
level.  The  two  Marine  Corps  job  experts  assigned  to  the  test  develop¬ 
ment  team  decided  that  a  major  engine  tune-up  of  the  M-151  Jeep  would  be 
the  best  predictor  of  proficiency  of  the  organizational  level  Automotive 
Mechanic  skill.  The  tune-up  was  supplemented  with  a  test  on  wheel  and 
brake  maintenance.  The  hands-on  test  was  tried  out  on  six  examinees. 

The  written  test  focused  on  the  M-54  truck  with  a  multifuel  engine.  The 
written  test  was  tried  out  on  the  six  examinees  who  took  the  hands-on 
test,  plus  three  others. 

Scoring  the  hands-on  test  was  not  perceived  as  a  problem: 

The  two  job  experts,  who  had  developed  the  tests,  observed 
the  six  subjects  independently  during  the  field  tryout  and 
eliminated  scoring  ambiguities  as  they  appeared.  As  a 
result,  by  the  end  of  this  period,  the  two  experts  were  in 
perfect  agreement  and  scoring  reliability  seemed  to  pose  no 
problem.  However,  it  should  be  noted  that,  as  was  true  of 
the  other  skills,  the  original  tryout  plan  called  for  five 
additional  examinees  to  check  scoring  reliability,  but  they 
could  not  be  provided ... .It  is  believed  that  additional  try¬ 
outs  would  have  yielded  the  same  results  (i.e.,  the  tests 
would  have  very  high  scoring  reliability),  because  steps 
within  the  tests  were  carefully  constructed  to  be  very 
specific  and  objective  (p.  6). 
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Infantry  Rifleman 


For  the  Infantry  Rifleman  skill,  16  major  test  areas,  and  tasks  in 
each  area,  were  ranked  by  87  Marine  Corps  job  experts-  The  hands-on 
tests  were  intended  to  parallel  combat  conditions  as  much  as  possible, 
but  to  avoid  conditions  that  could  cause  injury-  The  hands-on  test  was 
administered  to  81  examinees — tea  groups  of  5  to  8  at  a  time.  The  two 
job  experts,  plus  a  third  job  expert,  scored  the  tests  in  the  final 
tryout  "with  virtually  perfect  agreement."  The  NPRDC  report  does, 
however,  raise  some  cautions  about  administering  the  hands-on  test  for 
riflemen: 

Some  examiners  departed  from  the  prescribed  instructions  for 
administering  some  hands-on  tests  by  ad-libbing,  rephrasing 
questions,  inadvertently  giving  clues  to  the  answer  by 
gestures,  and  providing  more  orientation  to  a  test  than 
prescribed.  Some  NCO  examiners  had  difficulty  in  avoiding 
f training1  the  Marine  instead  of  testing  him.  These  diffi¬ 
culties  were  eliminated  by  conducting  training  classes  for 
the  examiners  (p.  4). 

The  written  test  was  also  administered  to  the  same  81  Marines  who  took 
the  hands-on  test.  Although  some  examinees  had  difficulty  in  reading 
certain  test  items,  this  problem  was  "mitigated  by  carefully  rewriting 
those  items.  No  other  problems  worthy  of  note  were  encountered  in 
developing  the  rifleman  written  test"  (p.  4). 

The  test  development  procedures  conformed  to  current  state-of-the- 
art  practices  for  proficiency  testing.  Job  experts  were  intimately 
involved  throughout  the  process,  which  should  enhance  content  validity. 

SCORING  THE  TESTS 

Test  booklets  and  answer  sheets  started  arriving  at  CNA  in  about 
September  1981.  We  started  scoring  them  immediately.  Scoring  the 
rifleman  tests  was  not  completed  until  September  1982,  a  year  later. 

Most  of  the  tests  used  were  scored  using  computer  programs.  The 
examinees1  responses  to  each  item  on  the  written  tests  and  the  admin¬ 
istrative  notation  of  either  pass  or  fail  for  each  hands-on  test  were 
entered  into  a  data  base  for  each  skill.  Programs  were  written  to  score 
each  test  in  accordance  with  the  scoring  schemes  devised  by  the  job 
experts.  Scores  were  then  computed  for  examinees  by  processing  their 
responses  through  the  appropriate  scoring  program. 

Ground  Radio  Repair 


Scoring  the  Ground  Radio  Repair  tests  was  fairly  straightforward, 
but  did  have  some  complexities.  The  written  test  had  two  parts.  The 
first  contained  52  items,  and  the  second  had  7.  Part  one  consisted  of 
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38  unit  weighted  multiple  choice  questions;  the  responses  were  recorded 
on  a  separate  answer  sheet-  The  remaining  questions  of  part  1  were 
worth  1,  2,  or  3  points,  with  the  weight  for  each  question  specified  by 
the  job  experts.  Seven  of  these  remaining  questions  were  open  ended; 
with  points  awarded  based  on  the  precision  of  the  response.  Part  2 
consisted  entirely  of  multiple  choice  questions.  Three  points  were 
awarded  for  each  correct  response.  A  total  written  score  was  calculated 
by  summing  the  scores  of  parts  1  and  2. 

Most  of  the  hands-on  portion  for  the  Ground  Radio  Repair  skill  was 
scored  by  the  administrator  while  giving  the  test.  The  hands-on  test 
had  10  scorable  units  that  were  divided  into  three  stages:  identify 
faulty  symptom,  circuit,  and  component.  Examinees T  responses  to  each 
step  were  recorded  on  an  answer  sheet.  The  point  value  associated  with 
steps  of  identifying  the  faulty  symptoms  and  circuits  were  recorded 
along  with  the  response.  Identifying  the  faulty  component  was  scored  by 
hand  using  a  given  scoring  pattern  ranging  from  0  to  8 .  This  pattern 
was  based  on  the  number  of  attempts  the  examinees  made  at  identifying 
the  faulty  component.  Each  examinee  was  allowed  as  many  attempts  as 
desired  within  the  given  time  limit  of  30  minutes  per  board,  and 
210  minutes  total  testing  time.  No  feedback  was  to  be  given  concerning 
the  accuracy  of  the  response  because  the  scoring  rules  involved  a 
penalty  by  progressing  from  a  correct  to  incorrect  response.  The 
individual  responses  to  each  step  were  fairly  lengthy;  therefore,  just 
the  points  scored  on  each  step  were  entered  into  the  data  base.  The 
time  required  to  complete  each  of  the  circuit  boards  was  also  entered 
into  the  data  base.  An  efficiency  score  was  calculated.  However,  the 
efficiency  scores  were  not  found  to  be  meaningful. 

Automotive  Mechanic 


The  scoring  scheme  for  the  Auto  Mechanics  was  the  most  straight¬ 
forward  of  the  three  skills.  The  written  test  consisted  of  61  multiple 
choice  questions.  The  examinees  wrote  the  letters  corresponding  to 
their  response  choices  on  a  one-page  answer  sheet.  The  scores  were 
calculated  by  awarding  one  point  for  each  correct  response. 

The  hands-on  test  consisted  of  81  steps.  The  test  administrator 
completed  a  step-by-step  checklist  for  each  examinee  by  recording  either 
a  pass  or  fail  in  accordance  with  the  examinee Ts  performance  on  that 
step.  The  time  required  to  complete  the  tasks  was  also  recorded.  This 
information  was  used  in  the  calculation  of  efficiency  scores.  The 
hands-on  score  was  computed  by  awarding  one  point  for  each  step  passed. 
The  efficiency  scores  were  calculated  by  dividing  the  hands-on  score  by 
the  time  required  to  complete  the  task.  The  administrative  instructions 
stated  that  three  points  were  to  be  awarded  if  specified  subsets  of 
steps  were  completed  in  a  given  order.  However,  the  test  booklets  did 
not  give  any  information  as  to  the  order  of  actual  completion;  there¬ 
fore,  all  steps  were  unit  weighted. 
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Infantry  Rifleman 


The  tests  for  Infantry  Rifleman  skill  were  much  longer  and  more 
complex  than  the  other  two  skills.  The  written  test  contained 
129  questions.  Responses  to  each  question  were  recorded  by  the  examinee 
in  space  provided  in  the  18-page  test  booklet.  Twenty-three  questions 
followed  a  multiple  choice  format  in  which  the  examinees  indicated  their 
response  selection  by  checking  the  space  provided  next  to  the  response 
choices.  The  remaining  items  were  matching-type  questions  for  which  the 
examinee  selected  the  appropriate  answer  from  a  list  of  several,  up  to 
79,  possible  alternatives.  The  number  corresponding  to  the  selected 
response  was  then  written  in  the  space  provided. 

The  scoring  program  for  this  test  was  extremely  complex.  Most  of 
the  questions  were  clustered  so  that  several  drew  responses  from  the 
same  matching  list.  Special  care  had  to  be  taken  not  to  award  double 
credit  for  duplicate  answers  within  subgroupings  that  did  not  allow 
them.  Several  subgroupings  required  a  specific  order  of  responses. 
Credit  was  awarded  only  until  the  order  was  broken,  regardless  of  the 
remaining  responses.  Most  correct  answers  were  awarded  one  point. 
However,  several  were  given  a  half  point,  while  still  others  were  worth 
two  or  three  points.  In  the  subgroupings  requiring  a  given  order,  a 
bonus  point  was  given  if  all  steps  were  completed  correctly.  This 
allocation  of  point  values  was,  like  the  other  skills,  designed  by  the 
job  experts. 

The  hands-on  test  covered  176  steps,  worth  a  total  of  332  points, 
including  negative  points  for  serious  errors.  These  steps  were  com¬ 
pleted  at  various  testing  stations  located  throughout  the  compound.  A 
pass  or  fail  was  recorded  for  each  step,  and  one  point  was  awarded  for 
each  step  executed  successfully.  Unlike  the  other  tests,  examinees  were 
penalized  for  not  completing  certain  items  satisfactorily.  These  penal¬ 
ties  ranged  from  one  to  five  points,  as  specified  by  the  job  experts. 

Because  the  hands-on  test  was  completed  in  various  stages  and  loca¬ 
tions,  the  tests  were  plagued  with  missing  data.  Several  examinees  were 
missing  one  or  two  subsections  of  the  test.  For  these  cases  we  esti¬ 
mated  the  scores  for  the  missing  sections,  using  a  multiple  regression 
equation  calculated  for  the  339  complete  cases.  We  felt  this  would  be 
the  best  estimate  of  performance  because  it  was  based  on  the  examinee’s 
performance  in  similar  situations. 

MEASURING  JOB  EXPERIENCE 

Developing  measures  of  job  experience,  just  like  developing  mea¬ 
sures  of  job  proficiency,  involved  many  decisions.  We  had  to  decide  how 
to  define  experience.  For  technical  jobs,  civilian  and  military  train¬ 
ing  and  experience  should  cumulate.  Some  experience  is  more  valuable 
than  others.  For  example,  in  electronics  repair,  workers  at  the 
organizational  level  of  maintenance  rarely  repair  circuit  boards  or 
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other  components.  They  are  more  likely  to  identify  the  faulty  com¬ 
ponent,  replace  it,  and  then  send  it  to  a  higher  echelon  of  maintenance 
for  repair.  The  implication  is  that  radio  repairers  assigned  to  the 
organizational  level  of  maintenance  would  receive  little  practice  in 
identifying  faulty  components  in  circuit  boards.  Radio  repairers  at 
support  levels  would  perform  these  tasks  more  frequently. 

Ground  Radio  Repair 


For  the  Ground  Radio  Repair  skill,  we  measured  job  experience  in 
terms  of  months  since  completion  of  formal  school  training,  weighted  by 
amount  and  type  of  maintenance  responsibilities  in  duty  assignments. 

The  echelons  of  maintenance  are  numbered  2  for  organizational  and  3  or  4 
for  support.  We  multiplied  months  since  completion  of  school  by  the 
number  for  the  echelon.  Finally,  we  also  multiplied  months  by  the  per¬ 
centage  time  spent  in  repairing  equipment,  as  opposed  to  performing 
other  duties.  We  assumed  that  the  higher  the  echelon  of  maintenance  the 
more  valuable  the  experience.  In  appendix  B  we  report  the  correlation 
between  measures  of  experience  and  the  performance  measures. 

Automotive  Mechanic 


For  the  Automotive  Mechanic  skill  we  computed  a  total  experience 
score  that  included  time,  training,  and  exposure  to  different  types  of 
equipment.  All  examinees  were  working  at  the  organizational  level,  and 
we  did  not  need  to  take  echelon  into  account.  The  examinees  marked 
whether  they  had  worked  on  six  different  types  of  equipment;  they  indi¬ 
cated  whether  they  had  paid  civilian  experience  as  a  mechanic  and 
whether  they  had  civilian  training  as  a  mechanic.  We  calculated  the 
months  they  had  worked  as  mechanics  in  the  Marine  Corps.  We  summed 
these  scores  to  obtain  a  total  experience  score. 

The  dominant  score  was  months  of  experience  as  a  mechanic  in  the 
Marine  Corps.  The  other  variables  contributed  little  to  the  correlation 
between  job  experience  and  performance  measures. 

Infantry  Rifleman 


Job  experience  for  the  Infantry  Rifleman  is  hard  to  conceptualize. 
During  peacetime,  the  primary  responsibility  of  infantrymen  is  to  train 
for  combat.  For  the  measure  of  experience  we  simply  computed  the  number 
of  months  the  examinees  had  in  the  Marine  Corps.  The  examinees  were 
asked  at  the  time  of  testing  how  many  months  ago  they  had  graduated  from 
the  Infantry  Training  School,  but  their  responses  were  too  unreliable 
for  use  in  the  statistical  analysis. 

Amount  of  experience  in  two  of  the  skills  was  controlled  inciden¬ 
tally  when  we  obtained  ASVAB  scores  of  the  examinees.  We  retained  only 
those  examinees  in  the  radio  repair  and  mechanic  samples  who  had  taken 
forms  5,  6,  or  7  of  the  ASVAB.  These  forms  were  administered  between 
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1  January  1976  and  1  October  1980.  A  few  of  the  older  examinees  in 
these  skills  had  joined  the  Marine  Corps  before  1976  and  some  of  the 
younger  ones  after  1  October  1980.  These  samples,  then,  tended  to  con¬ 
tain  Marines  in  their  first  term  of  enlistment,  or  early  in  their  first 
reenlistment;  and  the  mechanics  tended  to  have  at  least  1  year  of  job 
experience  in  the  Marine  Corps,  while  some  radio  repairers  had  just  a 
few  months.  (The  training  for  mechanics  takes  about  3  months  compared 
to  about  8  months  for  radio  repairers.)  For  the  rifleman  sample,  we 
retained  examinees  who  were  tested  with  forms  8,  9,  and  10  of  the  ASVAB, 
which  means  that  some  of  the  examinees  enlisted  during  fiscal  year  1981. 

DISCUSSION 

The  statements  quoted  from  the  NPRDC  report  illustrate  some  of  the 
deceptiveness  in  attempting  to  develop  and  use  hands-on  performance 
tests.  During  the  tryout,  which  is  comparable  to  a  research  environ¬ 
ment,  the  tests  and  administrators  behaved  as  expected.  Everyone  was 
confident  that  they  would  produce  accurate  scores,  which  the  NPRDC 
report  calls  scoring  reliability.  As  we  found  during  the  analysis, 
however,  scoring  accuracy  was  not  satisfactory.  Apparently  something 
changed  between  tryout  and  full-scale  administration  to  examinees. 

During  the  tryout,  the  administrators  did  not  have  a  vested  inter¬ 
est  in  how  well  the  examinees  scored.  In  fact,  because  they  developed 
the  tests,  they  probably  were  more  interested  in  making  sure  the  tests 
could  make  the  proper  discriminations.  During  full-scale  administra¬ 
tion,  however,  a  new  set  of  administrators  was  responsible  for  the 
testing.  The  new  administrators  had  no  vested  interest  in  how  good  the 
tests  were;  but  being  professional  Marines,  they  probably  had  a  vested 
interest  in  how  well  the  examinees  scored. 

The  quote  about  the  Infantry  sergeants  wanting  to  "train"  the 
examinees  typifies  the  responsibility  of  supervisors  in  the  military 
services.  During  peactime,  the  primary  job  of  immediate  supervisors  is 
to  train  junior  workers  in  the  skill.  Their  attitude  is  to  be  helpful. 
When  they  function  as  test  administrators,  we  expect  them  to  reverse 
their  attitudes  and  habits.  We  want  them  to  function  as  objective 
presenters  of  the  tests  and  evaluators  of  performance.  They  should  not, 
we  say,  intervene  in  the  behavior  of  the  examinees;  but  by  years  of 
training,  they  expect  to  intervene.  Based  on  the  differences  we  found 
among  test  administrators,  some  apparently  intervened  more  than  others. 

The  hands-on  tests  in  this  study,  even  in  the  full-scale  adminis¬ 
tration,  were  still  used  only  for  research  purposes.  No  personnel 
decisions  were  based  on  the  test  scores.  The  experience  with  ratings, 
and  other  types  of  measures,  is  that  the  scores  become  inflated  when 
they  are  used  in  personnel  decision  making.  The  inflation  of  the  hands- 
on  test  scores  for  the  radio  repair  and  mechanic  samples  we  found  in 
this  study  would  probably  be  increased  even  further  if  they  had  official 
status  in  the  Marine  Corps.  Hands-on  testing  has  a  lot  of  appeal,  but 
so  far  no  one  has  figured  out  how  to  keep  the  scoring  accurate. 
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In  this  appendix,  we  present  more  details  of  how  the  statistical 
analysis  was  accomplished  and  more  of  the  results.  In  the  first  part  we 
discuss  the  effects  of  deleting  cases  from  the  samples  because  of 
missing  data  (i.e.,  ASVAB  scores  and  training  grades).  We  then  show 
corrections  of  the  validity  coefficient  because  of  prior  selection  of 
examinees  on  the  basis  of  their  ASVAB  scores. 

EFFECTS  OF  DELETING  CASES  FROM  THE  SAMPLES 

In  all  samples,  the  hands-on  and  written  proficiency  tests  were 
administered  before  collection  of  the  other  data  began.  As  a  result, 
many  examinees  were  missing  one  or  more  sets  of  data  necessary  to 
complete  the  analysis.  In  this  subsection  we  present  means  and  standard 
deviations  of  proficiency  test  scores,  job  experience,  and  enlisted 
grade  for  examinees  in  each  subsample.  The  analysis  proceeded  in  the 
following  sequence: 

•  All  examinees  tested  with  the  hands-on  and  written 
proficiency  tests 

•  Examinees  with  a  complete  set  of  ASVAB  scores, 
forms  6  and  7 

•  Examinees  with  training  grades. 

The  smallest  number  of  cases  in  each  sample  was  obtained  when  we  deleted 
cases  that  did  not  have  training  grades. 

Ground  Radio  Repair 


For  the  Ground  Radio  Repair  skill,  cases  were  also  deleted  because 
some  examinees  had  training  and  experience  different  from  the  main  body 
of  radio  repairers  (specialty  number  2841).  One  group  of  19  examinees 
received  only  8  weeks  of  training  to  prepare  them  for  working  at  the 
organizational  level  of  maintenance  (specialty  number  2845).  Another 
group  of  10  examinees  was  trained  on  aviation  radio  repair  (specialty 
number  5937);  this  group  omitted  portions  of  the  written  test  that 
pertained  exclusively  to  ground  equipment.  As  a  result  their  scores 
were  not  comparable,  and  they  were  deleted. 

The  mean  and  standard  deviation  for  the  examinees  in  each  specialty 
are  shown  in  table  B-l.  The  proficiency  tests  are  reported  for  all 
groups,  and  Electronics  Repair  (EL)  aptitude  composite  scores  are 
reported  when  available.  Examinees  trained  to  perform  support-level 
repair  (2841)  performed  better  than  those  trained  to  perform 


organizational-level  maintenance  (2845)  on  both  the  hands-on  and  written 
tests.  Their  EL  scores  were  also  higher.  Examinees  trained  to  perform 
aviation  radio  repair  (5937)  performed  about  the  same  on  the  hands-on 
test  (114.0  versus  112.0  for  those  in  specialty  2841),  but  lower  on  the 
written  test  (48.1  versus  56.1).  Their  EL  score  was  also  lower  (64.5 
versus  71.1,  equivalent  to  standard  scores  of  113  and  118).  Deleting 
examinees  in  specialty  codes  2845  and  5937  resulted  in  a  sample  more 
homogeneous  in  terms  of  training  and  experience;  the  ASVAB  validity 
coefficients  for  the  sample  restricted  to  examinees  in  specialty  2841 
can  be  interpreted  with  greater  confidence  that  they  do,  in  fact,  pre¬ 
dict  job  performance  rather  than  being  a  function  of  different  training 
programs . 


TABLE  B-l 

TEST  SCORES  SHOWN  BY  SKILL-GROUND  RADIO  REPAIR 

Skilla 


Mean  score 

Standard  deviation 

Variable 

2841 

5937 

2845 

2841 

5937 

2845 

Hands-on  test 

112.0 

114.0 

84.6 

25.7 

18.5 

35.6 

Written  test 

56.1 

48.1 

33.1 

11.6 

9.8 

14.6 

Electronics  Repair  (EL)^ 

71.1 

64.5 

62.1 

8.8 

8.3 

11.7 

Aptitude 

Number  of  cases 

129 

10 

19 

aSkill: 

2841 — Ground  Radio  Repairer,  support  level 

5937 — Aviation  Radio  Repairer 

2845 — Ground  Organizational  Level  Repairer. 

^Electronics  Repair  Aptitude  Composite  reported  as  raw  scores. 


In  table  B-2 ,  we  show  test  scores  for  the  Ground  Radio  Repair  skill 
(specialty  2841  only),  when  the  number  of  cases  has  been  reduced  because 
of  missing  data.  We  report  the  means  and  standard  deviations  for  the 
hands-on  and  written  proficiency  test,  job  experience,  enlisted  grade, 
and  ASVAB  AFQT  and  EL  scores.  We  also  show  the  intercorrelation  among 
these  scores. 

Automotive  Mechanic 


All  examinees  in  the  Automotive  Mechanic  sample  received  the  same 
job  training  and  had  duty  assignments  as  mechanics.  In  table  B-3,  we 
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EFFECTS  OF  DELETING  CASES — GROUND  RADIO  REPAIR 
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5  AFQT  Not  available  0.10  0.40  0.11  0.16  1.0  0.46  0.09  0.40  0.15  0.13  1.0  0.48 

6  Mechanical  Not  available  0.24  0.42  0.37  0.21  0.46  1.0  0.29  0.49  0.45  0.20  0.48  1.0 


show  the  means  and  standard  deviations  for  the  hands-on  and  written 
proficiency  tests,  job  experience,  and  enlisted  grade  when  the  sample 
has  been  reduced  because  of  missing  data.  We  also  report  the  ASVAB, 
AFQT,  and  Mechanical  Maintenance  (MM)  scores  when  they  are  available. 

The  intercorrelations  among  these  variables  are  also  shown  in  table  B-3 . 

Infantry  Rifleman 


The  scores  for  the  Infantry  Rifleman  sample  are  shown  separately 
for  examinees  tested  with  forms  6  and  7  of  the  ASVAB  and  those  tested 
with  forms  8,  9,  and  10.  The  same  variables  are  included  as  for  the 
previous  samples  (hands-on  and  written  proficiency  tests,  job  experi¬ 
ence,  and  enlisted  grade,  and  ASVAB,  AFQT,  and  Combat  (CO)  scores).  The 
means,  standard  deviations,  and  intercorrelations  are  reported  in 
table  B-4. 

CORRECTION  FOR  RESTRICTION  IN  RANGE 

All  examinees  were  selected  for  assignment  to  these  three  skills 
only  if  they  had  qualifying  ASVAB  scores.  People  with  failing  ASVAB 
scores  were  excluded.  Because  the  intent  of  this  study  is  to  estimate 
the  validity  of  ASVAB  in  the  full  population  of  potential  recruits,  the 
validity  coefficients  must  be  corrected  for  the  effects  of  eliminating 
those  with  failing  ASVAB  scores.  The  correction  in  personnel  psychology 
is  called  "correction  for  restriction  in  range."  We  have  used  the 
multivariate  model,  which  considers  all  ASVAB  subtests  simultaneously 
[B  1  ]  * 

A  brief  review  of  the  multivariate  correction  for  restriction  in 
range  may  help  clarify  what  the  corrected  correlation  coefficients 
mean.  The  effect  of  excluding  those  with  failing  ASVAB  scores  is  to 
reduce  the  values  in  the  variance-covariance  matrix  of  ASVAB  subtest 
scores.  Because  the  selection  occurred  on  the  basis  of  ASVAB  scores, 
they  are  called  the  "explicit"  selection  variables.  All  variables 
correlated  with  the  ASVAB  also  have  their  variances  and  covariances 
reduced.  The  other  variables  that  are  affected  because  they  are 
correlated  with  the  explicit  selection  variables  (ASVAB)  are  called 
"incidental"  selection  variables.  In  our  study,  the  performance 
measures  are  subject  to  incidental  selection  to  the  extent  they  are 
correlated  with  the  ASVAB,  and  their  variance  and  covariances  are 
reduced  accordingly.  The  correction  procedure  attempts  to  restore  the 
population  variances  and  covariances  for  the  complete  set  of  variables — 
explicit  and  incidental — just  as  though  there  had  been  no  explicit 
selection  on  the  ASVAB. 

The  correction  procedure  requires  that  we  know  the  population 
variances  and  covariances  among  one  set  of  variables;  in  this  case  we 
know  the  population  values  for  the  ASVAB.  The  population  for  forms  6 
and  7  of  the  ASVAB  is  a  sample  of  applicants  for  enlistment  tested  in 
January  and  February  1980.  For  forms  8,  9,  and  10  the  population  is 
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EFFECTS  OF  DELETING  CASES — INFANTRY  RIFLEMAN 
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based  on  the  nationally  representative  sample  of  youth  used  to  construct 
a  new  ASVAB  score  scale  [B-2]  . 

Three  assumptions  are  required  to  use  the  multivariate  correction 
procedure : 

•  The  regression  of  the  incidental  variables  on  the  explicit 
variables  is  identical  in  both  selected  and  unselected 
(full  range)  groups. 

•  The  standard  errors  of  estimate  for  predicting  the 
incidental  variables  are  the  same  in  both  groups. 

•  The  correlations  among  the  incidental  variables  with  the 
explicit  variables  partialled  out  are  the  same  in  both 
groups . 

What  these  assumptions  require  is  that  the  score  distributions  be 
affected  only  by  truncation  of  the  explicit  variables  at  the  point  of 
selection.  The  correction  then  extends  the  multivariate  regression  line 
to  cover  the  full  range  of  scores.  If  the  assumptions  are  met,  then  the 
correction  is  exact.  In  practice,  of  course,  selection  is  rarely,  if 
ever,  based  solely  on  test  scores,  and  the  correction,  therefore,  is  an 
approximation.  The  correction  procedure  works  reasonably  well  for 
military  samples,  and  the  corrected  validity  coefficients  are  closer  to 
the  population  values  than  those  based  on  selected  samples. 

In  table  B-5  we  present  the  population  correlation  matrices  and 
standard  deviations  for  the  ASVAB  subtests  obtained  at  AFEES.  We  show 
the  corrected  validity  coefficients  of  the  ASVAB  aptitude  composites, 
using  test  scores  obtained  at  Recruit  Training  Depots.  In  appendix  C  we 
present  the  complete  correlation  matrices  for  ASVAB  scores  obtained  at 
AFEES  (Armed  Forces  Examining  and  Entrance  Stations*)  and  at  the  Depots. 

In  table  B-6  we  show  the  uncorrected  and  corrected  validity 
coefficients  and  standard  deviations  for  the  final  Ground  Radio  Repair 
sample  (59  cases).  In  table  B~ 7  we  show  the  same  data  for  the  final 
Automotive  Mechanic  sample  (131  cases).  In  table  B-8  we  show  the 
results  for  the  final  sample  of  Infantry  Rifleman.  Part  A  includes 
those  tested  with  forms  8,  9,  and  10  of  the  ASVAB  (53  cases).  Part  B 
includes  those  tested  with  forms  6  and  7  of  the  ASVAB  (140  cases);  no 
training  grades  are  included  for  those  tested  with  forms  6  and  7  of  the 
ASVAB.  Part  C  includes  the  combined  sample  of  Infantry  Rifleman 
(241  cases);  we  combined  cases  tested  with  forms  6,  7,  8,  9,  and  10  of 
the  ASVAB,  and  used  only  those  ASVAB  subtests  that  had  parallel  content 


*AFEES  are  now  called  Military  Entrance  Processing  Stations. 


across  the  forms.  For  the  combined  sample,  we  used  the  1980  Youth 
Population  as  the  base  matrix  for  making  the  correction. 
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POPULATION  CORRELATION  MATRICES  AND  STANDARD  DEVIATION  OF  ASVAB  SUBTESTS 
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TABLE  B-6 


ASVAB  VALIDITY  COEFFICIENTS3 — GRO UND  RADIO  REPAIRb 


Part  A: 

Uncorrected 

Coefficients 

Criterion 

variable 

ASVAB  score 

Hands- 

on  test 

Written 

test 

Total 

test 

Training 

grade 

Subtestc 

General  Information 

12 

21 

22 

22 

Numerical  Operations 

03 

22 

16 

26 

Attention  to  Detail 

18 

12 

20 

05 

Word  Knowledge 

31 

38 

46 

19 

Arithmetic  Reasoning 

40 

22 

42 

38 

Space  Perception 

44 

-05 

28 

10 

Mathematics  Knowledge 

32 

24 

37 

27 

Electronics  Information 

19 

26 

30 

23 

Mechanical  Comprehension 

27 

20 

31 

17 

General  Science 

23 

43 

43 

10 

Shop  Information 

07 

23 

20 

07 

Automotive  Information 

33 

11 

■  30 

29 

Mechanical  Interest 

-10 

09 

-01 

05 

Attentiveness  Interest 

05 

28 

21 

10 

Electronics  Interest 

-15 

11 

-03 

-06 

Combat  Interest 

-05 

-16 

-14 

-17 

Aptitude  Composite^ 

Clerical 

32 

25 

38 

23 

Combat 

30 

07 

25 

16 

Electronics  Repair 

21 

34 

36 

43 

Field  Artillery 

25 

29 

36 

35 

General  Maintenance 

25 

24 

32 

26 

General  Technical 

36 

33 

46 

30 

Mechanical  Maintenance 

17 

20 

25 

35 

B-10 


TABLE  B-6  (Cont'd) 


Part  B: 

Corrected 

Coefficients 

Criterion 

variable 

ASVAB  score 

Hands-  Written 

on  test  test 

Total 

test 

Training 

grade 

Subtestc 

General  Information 

47 

61 

62 

60 

Numerical  Operations 

41 

57 

56 

60 

Attention  to  Detail 

36 

26 

36 

22 

Word  Knowledge 

65 

71 

78 

56 

Arithmetic  Reasoning 

68 

68 

78 

71 

Space  Perception 

66 

35 

59 

44 

Mathematics  Knowledge 

62 

67 

74 

68 

Electronics  Information 

58 

67 

71 

65 

Mechanical  Comprehension 

53 

62 

66 

56 

General  Science 

60 

73 

76 

50 

Shop  Information 

43 

62 

60 

46 

Automotive  Information 

53 

47 

57 

53 

Mechanical  Interest 

-17 

08 

-06 

10 

Attentiveness  Interest 

28 

22 

29 

19 

Electronics  Interest 

-01 

16 

09 

12 

Combat  Interest 

25 

17 

25 

12 

Aptitude  Composite^ 

Clerical 

62 

61 

71 

57 

Combat 

59 

52 

64 

57 

Electronics  Repair 

59 

73 

76 

75 

Field  Artillery 

62 

70 

76 

72 

General  Maintenance 

62 

69 

75 

65 

General  Technical 

68 

69 

79 

62 

Mechanical  Maintenance 

50 

62 

64 

69 

aDecimals  omitted  . 

^Number  of  cases  is  59 . 
cTests  given  at  AFEES . 

^Tests  given  at  depots  . 
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TABLE  B-7 


ASVAB  VALIDITY  COEFFICIENTS*— AUTOMOTIVE  MECHANIC5 
Part  A:  Uncorrected  Coefficients 

Criterion  variable 


ASVAB  score 

Hands- 

on  test 

Written 

test 

Total 

test 

Training 

grade 

Subtestc 

General  Information 

30 

28 

35 

47 

Numerical  Operation 

09 

08 

10 

19 

Attention  to  Detail 

09 

01 

06 

08 

Word  Knowledge 

15 

35 

31 

37 

Arithmetic  Reasoning 

11 

19 

18 

37 

Space  Perception 

13 

20 

20 

08 

Mathematics  Knowledge 

10 

15 

15 

31 

Electronics  Information 

30 

35 

39 

59 

Mechanical  Compr ehens ion 

33 

34 

40 

52 

General  Science 

12 

33 

27 

44 

Shop  Information 

40 

27 

40 

54 

Automotive  Information 

50 

41 

55 

61 

Mechanical  Interest 

31 

23 

33 

40 

Attentiveness  Interest 

-14 

-18 

-20 

-13 

Electronics  Interest 

01 

-09 

-05 

-04 

Combat  Interest 

-08 

00 

-05 

17 

Aptitude  Composite^ 

Clerical 

23 

23 

28 

46 

Combat 

32 

34 

40 

47 

Electronics  Repair 

33 

44 

47 

68 

Field  Artillery 

30 

37 

41 

64 

General  Maintenance 

45 

50 

57 

73 

General  Technical 

23 

32 

34 

58 

Mechanical  Maintenance 

49 

49 

60 

73 
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TABLE  B-7  (Cont'd) 


Part  B: 

Corrected 

Coefficients 

Criterion 

variable 

ASVAB  score 

Hands-  Written 

on  test  test 

Total 

test 

Training 

grade 

Subtes tc 

General  Information 

38 

47 

50 

63 

Numerical  Operations 

28 

35 

37 

47 

Attention  to  Detail 

13 

16 

17 

20 

Word  Knowledge 

37 

58 

57 

69 

Arithmetic  Reasoning 

34 

50 

50 

66 

Space  Perception 

32 

45 

45 

49 

Mathematics  Knowledge 

32 

47 

47 

65 

Electronics  Information 

45 

57 

60 

77 

Mechanical  Comprehension 

45 

55 

59 

71 

General  Science 

36 

57 

55 

69 

Shop  Information 

49 

46 

56 

68 

Automotive  Information 

57 

55 

65 

71 

Mechanical  Interest 

21 

13 

19 

24 

Attentiveness  Interest 

19 

11 

17 

23 

Electronics  Interest 

22 

11 

19 

24 

Combat  Interest 

01 

14 

09 

25 

Aptitude  Compo  s i t  e^ 

Clerical 

39 

48 

51 

66 

Combat 

42 

54 

57 

65 

Electronics  Repair 

47 

63 

65 

82 

Field  Artillery 

45 

59 

62 

80 

General  Maintenance 

54 

66 

71 

84 

General  Technical 

39 

55 

55 

75 

Mechanical  Maintenance 

56 

65 

71 

83 

a 

b 

c 


d 


Decimals  omitted. 

Number  of  cases  is  131. 
Tests  given  at  AFEES  . 
Tests  given  at  Depots. 
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TABLE  B-8 


ASVAB  VALIDITY 

Part  A:  Examinees  tested 

COEFFICIENTS3 — INFANTRY  RIFLEMAN 

with  ASVAB  8/9/10^,  Uncorrected  Coefficients 

Criterion  variable 

ASVAB  score 

Hands- 

on  test 

Written 

test 

Total 

test 

Training 

grade 

Subtestc 

General  Science 

41 

56 

57 

22 

Arithmetic  Reasoning 

26 

44 

41 

25 

Word  Knowledge 

38 

63 

59 

31 

Paragraph  Comprehension 

48 

43 

53 

21 

Numerical  Operations 

14 

22 

21 

-03 

Coding  Speed 

13  ■ 

30 

25 

08 

Auto/ Shop 

42 

28 

41 

13 

Mathematics  Knowledge 

45 

51 

56 

31 

Mechanical  Comprehension 

50 

40 

53 

30 

Electronics  Information 

44 

47 

54 

29 

Aptitude  Composite^ 
Clerical 

20 

31 

30 

-04 

Combat 

40 

48 

52 

13 

Electronics  Repair 

41 

53 

55 

18 

Field  Artillery 

43 

54 

57 

22 

General  Maintenance 

42 

48 

53 

15 

General  Technical 

37 

55 

54 

21 

Mechanical  Maintenance 

43 

43 

50 

16 
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TABLE  B-8  (Cont'd) 


Part  A:  Examinees  Tested  with  ASVAB  8/9/10^,  Corrected  Coefficients 


Criterion 

variable 

Hands- 

Written 

Total 

Training 

ASVAB  score 

on  test 

test 

test 

grade 

Subtestc 


General  Science 

55 

72 

72 

37 

Arithmetic  Reasoning 

44 

66 

63 

46 

Word  Knowledge 

52 

81 

76 

46 

Paragraph  Comprehension 

61 

68 

73 

35 

Numerical  Operations 

50 

59 

62 

23 

Coding  Speed 

40 

57 

55 

19 

Auto/ Shop 

45 

38 

46 

26 

Mathematics  Knowledge 

53 

65 

67 

43 

Mechanical  Comprehension 

57 

52 

61 

38 

Electronics  Information 

52 

66 

67 

45 

Aptitude  Composite^ 

Clerical 

41 

51 

52 

•  08 

Combat 

58 

69 

72 

29 

Electronics  Repair 

56 

74 

74 

38 

Field  Artillery 

57  ■ 

74 

74 

40 

General  Maintenance 

56 

68 

70 

32 

General  Technical 

53 

77 

74 

41 

Mechanical  Maintenance 

53 

60 

64 

32 

B-15 


I 


TABLE  B-8  (Contfd) 

Part  B:  Examinees  Tested  with  ASVAB  6 / 7 e 

Criterion  variable 


Uncorrected  Corrected 


ASVAB  score 

Hands- 

on  test 

Written 

test 

Total 

test 

Hands- 

on  test 

Written 

test 

Total 

test 

Subtest 

General  Information 

26 

45 

41 

50 

65 

64 

Numerical  Operations 

07 

26 

19 

38 

53 

50 

Attention  to  Detail 

01 

-02 

00 

16 

14 

17 

Word  Knowledge 

32 

38 

41 

58 

68 

70 

Arithmetic  Reasoning 

28 

32 

36 

58 

66 

69 

Space  Perception 

22 

23 

27 

•  47 

50 

54 

Mathematics  Knowledge 

32 

45 

45 

55 

65 

67 

Electronics  Information 

30 

31 

36 

54 

59 

63 

Mechanical  Comprehension 

40 

34 

44 

60 

60 

67 

General  Science 

37 

43 

47 

59 

68 

71 

Shop  Information 

31 

39 

41 

48 

59 

59 

Automotive  Information 

25 

32 

33 

47 

.54 

56 

Mechanical  Interest 

06 

00 

03 

14 

07 

12 

Attentiveness  Interest 

-07 

05 

-02 

18 

29 

26 

Electronics  Interest 

07 

05 

08 

24 

23 

26 

Combat  Interest 

22 

24 

27 

30  • 

33 

35 

Aptitude  Composite 

Clerical 

24 

39 

37 

53 

65 

66 

Combat 

31 

28 

35 

53 

54 

59 

Electronics  Repair 

41 

52 

55 

62 

72 

75 

Field  Artillery 

38 

52 

53 

61 

72 

74 

General  Maintenance 

47 

51 

58 

66 

71 

76 

General  Technical 

43 

58 

59 

64 

77 

78 

Mechanical  Maintenance 

41 

37 

46 

60 

60 

67 

B-16 


TABLE  B-8  (Gontfd) 


Part  G:  Pooled  Groups^ 

Criterion  variable 


ASVAB  score 


General  Science 
Arithmetic  Reasoning 
Word  Knowledge 
Numerical  Operations 
Auto/ Shop 

Mathematics  Knowledge 
Mechanical  Comprehension 
Electronics  Information 

Combat  Aptitude 


Uncorrected 


Hand  s- 

on  test 

Written 

test 

Total 

test 

37 

49 

50 

25 

37 

36 

32 

48 

47 

09 

23 

18 

36 

41 

45 

32 

44 

44 

41 

37 

46 

43 

41 

43 

37 

52 

51 

Corrected 


Hands- 

on  test 

Written 

test 

Total 

test 

50 

67 

66 

45 

61 

60 

46 

67 

64 

29 

49 

44 

47 

52 

56 

45 

61 

60 

52 

55 

60 

48 

60 

62 

51 

69 

68 

aDecimals  omitted . 

^Number  of-  cases  is  53. 
cTests  given  at  AFEES . 

^Tests  given  at  Depots. 
eNumber  of  cases  is  140. 

%o  training  grades  available  for  this  group. 
^Number  of  cases  is  241,  tested  with  forms  6,  7, 


8 ,  9 ,  and  10  . 


* 
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APPENDIX  C 


DETAILED  STATISTICAL  TABLES 


APPENDIX  C 


DETAILED  STATISTICAL  TABLES 


In  this  appendix  we  present  a  complete  set  of  intercorrelation 
matrices.  In  tables  C-l,  C-2,  and  C-3  we  show  the  intercorrelation 
among  the  parts  of  the  hands-on  and  written  proficiency  tests  for  the 
Ground  Radio  Repair,  Automotive  Mechanic,  and  Infantry  Rifleman  skills, 
respectively.  All  examinees  in  each  skill  were  included  when  computing 
these  intercorrelation  matrices.  The  Ground  Radio  Repair  sample 
includes  only  examinees  with  specialty  number  2841.  (See  appendix  B  for 
a  description  of  this  specialty  number .)  The  variables  are  described 
for  each  table . 

In  tables  C-4  and  C-5 ,  we  present  the  uncorrected  and  corrected 
intercorrelation  matrices  for  the  final  Ground  Radio  Repair  and 
Automotive  Mechanic  samples,  respectively.  (See  appendix  B  for  a 
description  of  the  procedure  to  correct  correlation  coefficients  for 
restriction  in  range.)  Part  A  of  each  table  contains  the  uncorrected 
coefficients,  and  part  B  the  corrected  coefficients.  In  tables  C-6, 

C-7 ,  and  C-8  we  present  the  uncorrected  and  corrected  intercorrelation 
matrices  for  the  Infantry  Rifleman  samples.  Table  C-6  contains  results 
for  examinees  tested  with  forms  6  and  7  of  the  ASVAB  at  time  of  enlist¬ 
ment;  table  C-7  is  for  examinees  tested  with  forms  8,  9,  and  10  of  the 
ASVAB.  Table  C-8  contains  the  uncorrected  and  corrected  intercorrela¬ 
tion  matrices  for  the  combined  sample  of  riflemen,  using  the  ASVAB 
subtests  that  are  common  to  forms  6,  7,  8,  9,  and  10. 
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TABLE  C-l 


INTERCORRELATION  AMONG  PARTS  OF  HANDS-ON 
AND  WRITTEN  PROFICIENCY  TESTS 

GROUND  RADIO  REPAIR 

Part  A:  Hands-on  Test 


V Aft  1 A PLE 

HE  AM 

STANOAftO  0  £  V 

BOAR  Cl 

12.2636 

3.1391 

00  Aft  C2 

12.2338 

3.5199 

6CARC3 

10.3571 

4. 3064 

90 AH  C  4 

11.1688 

4. 4557 

9CARC5 

11.7013 

3.  5462 

BOAR  06 

10.6948 

3.7422 

80ARC7 

11.3377 

4.2105 

60  Aft  08 

9.5044 

5.2427 

BOAR  C9 

11.5065 

4.5060 

00  Aft  C 1 0 

9.8312 

5. 3636 

syhsc 

10.7078 

1. 7930 

C 1  ft  s  c 

35.8766 

5.4836 

corpse 

56.1948 

20- 4572 

$YH1 

1.8312 

0. 5578 

S  YM2 

1. 8701 

0. 4944 

SY  hi 

1.8571 

0.5168 

S  Y  H  4 

1.8052 

0. 5949 

$  YH5 

1.9740 

0. 2272 

5YH6 

2.0000 

0.0000 

SYH7 

1.9091 

0.4100 

SYM8 

1.9091 

0. 4100 

S  Y  H  9 

1.8701 

0.4944 

SYH10 

1.6818 

0. 7294 

C1RI 

3.9091 

0.5758 

CIR2 

3.7662 

0.8841 

CIR3 

3.5390 

1.  1999 

CIR4 

3.4026 

i. 3745 

CIH5 

3-8961 

0.5503 

CIR6 

3.9226 

0. 4865 

C 1 R  7 

3.6364 

1. 1537 

CIR8 

2.9870 

1.6996 

CIR9 

3.5714 

1.2306 

CIRtC 

3.2403 

1.5679 

CO  HP  1 

6.6234 

2.8699 

CQHP2 

6.5974 

2.8480 

C0HP3 

4.9610 

3. 4846 

COHP  4 

5.9610 

3. 3174 

CO  HP  5 

5.3312 

3.3758 

CGHP6 

4.7662 

3.6314 

CO  HP  7 

5.7922 

3. 4540 

COHP  6 

4.6883 

3.  8083 

CCHP9 

6-0649 

3-  3794 

COHP 1C 

4. 909  1 

3. 7857 

SCORE 

110.7792 

25.7345 

cases 

154 

154 

154  t 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

154 

t  54 

154 


02 


TABLE  C-l  (Cont T  d) 


Part  A;  Hands-on  Test 


Variable 


Description 


BOARD 

i 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

i 

BOARD 

2 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

2 

BOARD 

3 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

3 

BOARD 

4 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

4 

BOARD 

5 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

5 

BOARD 

6 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

6 

BOARD 

7 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

7 

BOARD 

8 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

8 

BOARD 

9 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

9 

BOARD 

10 

Symptom 

plus 

circuit 

plus 

component 

score 

for 

BOARD 

10 

SYMSC 

Symptom 

score 

i,  sum  of 

:  BOARDS  1  through  10 

CIRSC 

Circuit 

score 

i,  sum  of 

BOARDS  1  through  10 

GOMPSC 

Component  score,  sum 

of  BOARDS  1  through  1 

10 

SYM1 

Identify 

faulty 

symptom, 

BOARD 

i 

SYM2 

Identify 

faulty 

symptom, 

BOARD 

2 

SYM3 

Identify 

faulty 

symptom. 

BOARD 

3 

SYM4 

Identify 

faulty 

symptom, 

BOARD 

4 

SYM5 

Identify 

faulty 

symptom, 

BOARD 

5 

SYM6 

Identify 

faulty 

symptom, 

BOARD 

6 

SYM7 

Identify 

faulty 

symptom, 

BOARD 

7 

SYM8 

Identify 

faulty 

symptom, 

BOARD 

8 

SYM9 

Identify 

faulty 

symptom, 

BOARD 

9 

SYM10 

Identify 

faulty 

symptom, 

BOARD 

10 

CIR1 

Identify 

faulty 

circuit, 

BOARD 

1 

CIR2 

Identify 

faulty 

circuit , 

BOARD 

2 

CIR3 

Identify 

faulty 

circuit, 

BOARD 

3 

CIR4 

Identify 

faulty 

circuit , 

BOARD 

4 

CIR5 

Identify 

faulty 

circuit, 

BOARD 

5 

CIR6 

Identify 

faulty 

circuit , 

BOARD 

6 

CIR7 

Identify 

faulty 

circuit, 

BOARD 

7 

CIR8 

Identify 

faulty 

circuit , 

BOARD 

8 

CIR9 

Identify 

faulty 

circuit, 

BOARD 

9 

CIR10 

Identify 

faulty 

circuit , 

BOARD 

10 

C0MP1 

Identify 

faulty 

component 

,  BOARD 

1 

C0MP2 

Identify 

faulty 

component 

,  BOARD 

2 

C0MP3 

Identify 

faulty 

component,  BOARD 

3 

C0MP4 

Identify 

faulty 

component 

,  BOARD 

4 

C0MP5 

Identify 

faulty 

component 

,  BOARD 

5 

C0MP6 

Identify 

faulty 

component 

,  BOARD 

6 

C0MP7 

Identify 

faulty 

component 

,  BOARD 

7 

COMP  8 

Identify 

faulty 

component 

,  BOARD 

8 

C0MP9 

Identify 

faulty 

component 

,  BOARD 

9 

C0MP10 

Identify 

faulty 

component 

,  BOARD 

10 

SCORE 

Sum  of  BOARD  1  through  10 
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TABLE  C-l  (Coat f  d) 


Part  B:  Written  and  Hands-on  Test 


VAR  I  ABLE 

MEAN 

STANC4K0  CtV 

CASES 

GEhSC 

15.4091 

3.  1366 

154 

METERSC 

19*3831 

4. 4298 

154 

CSCILLSC 

11.5779 

5. 6647 

154 

P ART25C 

8.9610 

4. 3112 

154 

PART  ISC 

46.3701 

9.5116 

154 

SYMSC 

18*7073 

1*7930 

154 

CIRSC 

35.3766 

5.4336 

1  54 

CGMPSC 

56*1943 

20. 4572 

154 

80AR01 

12*3636 

3.  1391 

154 

8QARC2 

12*2338 

3.5199 

154 

8QARC  3 

10*3571 

4. 3064 

154 

8GARC4 

11.1638 

4.4557 

154 

0OARC5 

11.7013 

3.5482 

154 

8GARC6 

10*6948 

3. 7422 

154 

0GARO7 

11*3377 

4.2105 

154 

BOARCa 

9.5844 

5.2427 

154 

8GARCS 

11-5065 

4.5060 

154 

BGAR010 

9.3312 

5. 3636 

154 

SCORE 

110.7792 

25.7345 

154 

OVERALL 

166.1104 

29.5685 

154 

/•'  H 

(f 
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TABLE  C-l  (Cont 1 d) 


Variable 

GEN  SC 
METERSC 
OSCILLSC 
PART2SC 
PARTI SC 
SYMSC 
CIRSC 
COMPSC 
BOARD  1 
BOARD  2 
BOARD  3 
BOARD  4 
BOARD  5 
BOARD  6 
BOARD  7 
BOARD  8 
BOARD  9 
BOARD  10 
SCORE 
OVERALL 


Part  B:  Written  and  Hands-on  Test 


Description 


General  Electronics  score,  written 

Meters  score,  written 

Oscilloscope  score,  written 

Troubleshooting  UIQ-10  Amplifier,  written 

GEN SC  plus  METERSC  plus  OSCILLSC 

Symptom  score,  hands-on 

Circuit  score,  hands-on 

Component  score,  hands-on 

BOARD  1  score,  hands-on 

BOARD  2  score,  hands-on 

BOARD  3  score,  hands-on 

BOARD  4  score,  hands-on 

BOARD  5  score,  hands-on 

BOARD  6  score,  hands-on 

BOARD  7  score,  hands-on 

BOARD  8  score,  hands-on 

BOARD  9  score,  hands-on 

BOARD  10  score,  hands-on 

Score  on  hands-on  test 

PARTI SC  plus  PART2SC  plus  SCORE 
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TABLE  C-l  (Cont'd) 
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TABLE  C-2 


INTERCORRELATION  AMONG  PARTS  OF  HANDS-ON 
AND  WRITTEN  PROFICIENCY  TESTS 

AUTOMOTIVE  MECHANIC 


VAHI ABLE 

HEAN 

STANOAHQ  DEV 

CASES 

RANK 

3.1597 

0.8938 

263 

MONO  JT 

0.4601 

2.4396 

263 

HONEXP 

17.  333b 

16.4767 

263 

poexp 

3.8555 

9. 8466 

263 

HIGHSCH 

4.6426 

9.  1252 

263 

TRAOESCH 

0.5285 

2.6280 

263 

QTHEREX 

1.0837 

6.7156 

263 

H  35 

0. 6996 

0.  4593 

263 

M  5  4 

0.5970 

0.4914 

263 

H151 

0.8821 

0. 3347 

263 

N  561 

0.3840 

0. 4373 

263 

H  8  80 

0.7452 

0.4620 

263 

M813 

0.2129 

0.4102 

263 

FUEL 

5.9620 

1. 9572 

263 

STEER 

1.8251 

0.8288 

263 

COOL 

5.5894 

1. 3671 

263 

H54W 

20.030  4 

5. 7591 

263 

WRITTEN 

33.4068 

7. 3623 

263 

CQ HP  SC 

7. 4601 

0. 9356 

26  3 

CQILSC 

16.7072 

2. 3476 

263 

PLUG  SC 

2.1863 

0. 4780 

263 

V  AC  SC 

3.  7909 

0. 6465 

263 

PTIHESC 

6.5235 

1.  2893 

263 

ERO 

7.4297 

1.1231 

263 

BATTSC 

9.3042 

l  •  4  8  4  9 

263 

ALT5C 

7.0190 

1. 8000 

263 

WHEELSC 

11.7110 

2.4166 

263 

HOSCORE 

72.1369 

7. 7265 

263 
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TABLE  C-2  (Cont 1 d) 


Variable 


Description 


RANK 

MONOJT 

MONEXP 

PDEXP 

HIGHSCH 

OTHEREX 

M35 

M54 

M151 

M561 

M880 

M813 

FUEL 

STEER 

COOL 

M54W 

WRITTEN 

COMPSC 

COILSC 

PLUGSC 

VACSC 

PTIMESC 

ERO 

B  ATT  SC' 
ALT SC  ■ 
WHEELSC 
HOSCORE 


Enlisted  grade 

Months  of  on-the-job  training 

Months  of  experience  as  automotive  mechanic  in  Marine  Corps 

Months  of  paid  experience  as  mechanic  in  civilian  life 

High  school  courses  in  mechanics 

Other  experience  as  mechanic  in  civilian  life 

Experience  with  vehicle  in  Marine  Corps 

Experience  with  vehicle  in  Marine  Corps 

Experience  with  vehicle  in  Marine  Corps 

Experience  with  vehicle  in  Marine  Corps 

Experience  with  vehicle  in  Marine  Corps 

Experience  with  vehicle  in  Marine  Corps 

Score  on  Fuel  system  written  test  items 

Score  on  Steering  system  written  test  items 

Score  on  Cooling  system  written  test  items 

Score  on  M54  (multifuel  engine)  truck,  written  test  items 

Total  written  test  score 

Compression  score  on  hands-on  test 

Coil  score  on  hands-on  test 

Sparkplug  score  on  hands-on  test 

Vacuum  test  score  on  hands-on  test 

Precision  timing  score  on  hands-on  test 

Equipment  repair  order  score  on  hands-on  test 

Battery  test  score  on  hands-on  test 

Alternator  test  score  on  hands-on  test 

Wheel  and  brake  score  on  hands-on  test 

Total  hands-on  test  score 
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TABLE  03 


INTERCORRELATION  AMONG  PARTS  OF  HANDS-ON 
AND  WRITTEN  PROFICIENCY  TESTS 

INFANTRY  RIFLEMAN 


VARI ABLE 

mean 

STANC4HJ  0E9 

CASES 

PART  ASC. 

A. 5026 

1.3768 

384 

PAR70SC 

2.C2SI 

0.8057 

38* 

PAhUSC 

12.0964 

3.1196 

384 

PARTCSC 

2.  2422 

1. 7851 

384 

PihlESC 

It.  50  36 

3.4184 

384 

PARTFSC 

3. 7437 

2.9331 

384 

PAR  TCSC 

3-  t  7  t  9 

1.4203 

384 

PAhlHSC 

7.43*9 

2.6616 

384 

PARTISC 

5.2930 

1.0813 

384 

NRITUN 

56.2214 

10.7177 

384 

HAPCGPP 

14. 2406 

15.2179 

384 

FlhSTAIQ 

16.5120 

5. 7  1 1  3 

384 

F  Ihfc  UAM 

14.5156 

5.6203 

384 

AMimi 

12.8380 

7. 2860 

384 

AMI1K2 

11.5770 

5. 2593 

384 

AN7UK 

2*.*150 

10.0213 

384 

FIMRGiC 

49.0260 

22.0142 

384 

HANOSCN 

138.7153 

37.  5857 

384 

Rif EIF 

2.3464 

t. 4765 

384 

iAPR 

2.2057 

0.5123 

364 

1 1 ££  *P 

2.2344 

0. 5814 

384 

RANK 

t. 9609 

0. 7377 

384 

TIRFSERV 

11.1094 

5.7287 

384 

RONi rs 

6.3672 

8.2837 

384 

HONfi  i  F 

2.6054 

4.  6690 

384 
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TABLE  03  (Cont’d) 


Variable 


Description 


Part  A 
Part  B 
Part  G 
Part  D 
Part  E 
Part  F 
Part  G 
Part  H 
Part  I 
Written 
MAPCOMP 
FIRSTAID 
FIRETEAM 
ANTITK1 
ANTITK2 
ANTITK 
FIRINGSC 
HANDS-ON 
RIFEXP 
EXP 

ITSEXP 

RANK 

TIMESERV 

MONITS 

MONRIF 


Infantry  assignments 

Rifleman  duties 

Weapon  characteristics 

Handling  prisoners  -  1 

Handling  prisoners  -  2 

Identify  acronyms 

Definition  of  acronyms 

Nuclear,  biological,  chemical  defense 

Identification  of  targets 

Written  test  score  sum  of  Parts  A  through  I 

Map  and  compass  test 

Perform  first  aid  on  dummies 

Signals,  formations,  movement  of  fire  team 

Locate  and  neutralize  mine 

Set  up  antipersonnel  mine 

Sum  of  ANTITK  I  and  2 

Fire  rifle  at  23  pop-up  targets 

Hands-on  test  score  sum  of  hands-on  parts 

Self-report  of  experience  as  rifleman 

Self-report  of  experience  in  Marine  Corps 

Self-report  of  experience  since  Infantry  Training  School 

Enlisted  grade 

Self-report  of  months  in  service 

Self-report  of  months  since  Infantry  Training  School 
Self-report  of  months  as  rifleman 
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TABLE  G-4 


INTERCORRELATION  MATRICES  FOR  FINAL  SAMPLE 
GROUND  RADIO  REPAIR 


Part  A:  Uncorrected  Correlation  Matrix 


Standard 


lie  an 

deviation 

ARGI 

11.712 

2.364 

ARNO 

37.695 

6.906 

A  R  A  0 

15.763 

3.213 

ARWK 

25.356 

4.122 

ARAR 

17.169 

2.379 

ARSP 

i6. 136 

3.277 

ARMK 

16.797 

2.462 

AREI 

23.492 

3. 757 

ARMC 

14.763 

2.996 

ARGS 

15.390 

2.754 

ARSI 

16.169 

3.147 

AR  A  I 

13.893 

4.369 

ARCM 

12.576 

4.680 

ARCA 

9.932 

2.434 

ARCE 

11.035 

4.149 

ARCC 

19.339 

4.  361 

ORCO 

82. 949 

8.603 

ORFA 

76.051 

10.061 

ORMM 

81.542 

12.638 

ORGM 

60.453 

8.724 

ORCL 

67.492 

8.508 

DRGT 

41.559 

5.200 

OREL 

69.102 

9.121 

ORSC 

72.949 

7.895 

DRST 

46.203 

6.635 

ORQF 

35.398 

6.761 

ORGCT 

57.381 

5.954 

WRSTANO 

43.345 

9.260 

HOCST 

50.733 

10.005 

PROFICST 

43.964 

9.793 

FCGST 

43.996 

10.246 

HOC  TQTST 

49.152 

9.635 

N  OF  CASES 

;  =  59 
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TABLE  C-4  (Goat f  d) 


Variable 


Description 


ARGI 

ARNO 

ARAD 

ARWK 

ARAR 

ARSP 

ARMK 

AREI 

ARMC 

ARGS 

ARSI 

ARAI 

ARCM 

ARCA 

ARCE 

ARCC 

DRCO 

DRFA 

DRMM 

DRGM 

DRCL 

DRGT 

DREL 

DRSC 

DRST 

DROF 

DRGCT 

WRSTAND 

HOCST 

PROFICST 

FCGST 

HOCTOTST 


General  Information  subtest  raw  score,  tested  at  AFEESa 
Numerical  Operations  subtest  raw  score,  tested  at  AFEES 
Attention  to  Detail  subtest  raw  score,  tested  at  AFEES 
Word  Knowledge  subtest  raw  score,  tested  at  AFEES 
Arithmetic  Reasoning  subtest  raw  score,  tested  at  AFEES 
Space  Perception  subtest  raw  score,  tested  at  AFEES 
Mathematics  Knowledge  subtest  raw  score,  tested  at  AFEES 
Electronics  Information  subtest  raw  score,  tested  at  AFEES 
Mechanical  Comprehension  subtest  raw  score,  tested  at  AFEES 
General  Science  subtest  raw  score,  tested  at  AFEES 
Shop  Information  subtest  raw  score,  tested  at  AFEES 
Automotive  Information  subtest  raw  score,  tested  at  AFEES 
Mechanical  Interest  subtest  raw  score,  tested  at  AFEES 
Attentiveness  Interest  subtest  raw  score,  tested  at  AFEES 
Electronics  Interest  subtest  raw  score,  tested  at  AFEES 
Combat  Interest  subtest  raw  score,  tested  at  AFEES 
Combat  aptitude  composite  raw  score,  tested  at  Depot 
Field  Artillery  composite  raw  score,  tested  at  Depot 
Mechanical  Maintenance  composite  raw  score,  tested  at  Depot 
General  Maintenace  composite  raw  score,  tested  at  Depot 
Clerical  composite  raw  score,  tested  at  Depot 
General  Technical  composite  raw  score,  tested  at  Depot 
Electronics  Repair  composite  raw  score,  tested  at  Depot 
Surveillance/ Communications  composite  raw  score,  tested  at 
Depot 

Skilled  Technical  composite  raw  score,  tested  at  Depot 

Operators/ Food  composite  raw  score,  tested  at  Depot 

AFQT  raw  score,  tested  at  Depot 

Written  test,  standardized 

Hands-on  test,  standardized 

WRSTAND  plus  HOCST  plus  FCGST,  standardized 

Final  Course  Grade,  standardized 

HOCST  plus  WRSTAND,  standardized 


aForms  6  and  7  of  the  ASVAB. 
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TABLE  C-4  (Cont'd) 
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TABLE  C-4  (Cont'd) 
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Part  B:  Corrected  Correlation  Matrix 


Mean 

Standard 

deviation 

ARG  I 

11.712 

3.200 

ARNO 

37.695 

10.500 

ARAD 

15.763 

4.000 

ARWK 

25.356 

7.000 

ARAR 

17.170 

4.700 

ARSP 

16. 136 

4.200 

ARMK 

16.797 

4.900 

AREI 

23.492 

5.700 

ARMC 

14.763 

4.500 

ARGS 

15.390 

4.300 

ARSI 

16.170 

4.200 

A  R  A  I 

13.398 

4.800 

ARCH 

12.576 

4.200 

ARCA 

9.932 

3.000 

ARCE 

11.035 

4.600 

ARCC 

19.339 

3.800 

ORCO 

82.949 

11.916 

ORFA 

76.051 

20.613 

DRMM 

31.542 

18.590 

ORGM 

60.458 

15.347 

ORCL 

67.492 

13. 516 

ORGT 

41.559 

8.969 

DREL 

69.102 

18.877 

OR  SC 

72.949 

15.439 

ORST 

46. 203 

14.142 

OROF 

35.393 

9.977 

DRGCT 

57.831 

11.304 

WRSTANO 

43.345 

13.765 

HOCST 

50.78  3 

14.576 

PROFICST 

49.964 

17.139 

FCGST 

48.996 

14.679 

HOC  TO  T  ST 

49.152 

16.397 

M  OF  CASES  =  59 
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TABLE  C-5 


INTERCORRELATION  MATRICES  FOR  FINAL  SAMPLE 
AUTOMOTIVE  MECHANIC 


Part  A:  Uncorrected  Intercorrelation  Matrix 


Standard 


Mean 

deviation 

ARGI 

9.634 

2.684 

ARNO 

30.153 

8.603 

A  R  AO 

14.008 

3.551 

ARWK 

13.122 

4.963 

ARAR 

11.733 

3.444 

ARSP 

13.603 

3.592 

ARHK 

11.191 

3.355 

AREI 

19.298 

4.444 

ARMC 

10.886 

3.403 

ARGS 

10.496 

3.257 

ARSI 

14.008 

3.792 

AR  A I 

12.267 

4.804 

ARCM 

13.786 

3.827 

ARCA 

9.924 

2.916 

ARCE 

9.061 

4.066 

ARCC 

18.908 

3.902 

orco 

71.412 

11.923 

ORFA 

59.870 

11.694 

ORMM 

69.527 

16.702 

DRGM 

45.389 

11.304 

ORCL 

54.443 

8.980 

ORGT 

29.931 

6.747 

OREL 

51.122 

11.538 

ORSC 

54.954 

9.755 

ORST 

32.496 

7.997 

OROF 

31.595 

7.486 

ORGCT 

43.916 

7.473 

WR  STAND 

50.235 

9.933 

HOC  EFF  ST 

49.620 

9.935 

PROFICST 

50.247 

10.327 

FCGST  AND 

50.709 

10.016 

PRQFIC2S 

49.973 

9.983 

WRFCG 

100.944 

17.568 

N  OF  cases  =  131 
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TABLE  C-5  (Coat 1 d) 


Variable 


Description 


ARGI 

ARNO 

ARAD 

ARWK 

ARAR 

ARSP 

ARMK 

AREI 

ARMC 

ARGS 

ARSI 

ARAI 

ARCM 

ARCA 

ARCE 

ARCC 

DRCO 

DRFA 

DRMM 

DRGM 

DRCL 

DRGT 

DREL 

DRSC 

DRST 

DROF 

DRGCT 

WRSTAND 

HOC 

HOCEFFST 

PROFICST 

FCGSTAND 

PROFIC2S 

WRFCG 


General  Information  subtest  raw  score,  tested  at  AFEESa 
Numerical  Operations  subtest  raw  score,  tested  at  AFEES 
Attention  to  Detail  subtest  raw  score,  tested  at  AFEES 
Word  Knowledge  subtest  raw  score,  tested  at  AFEES 
Arithmetic  Reasoning  subtest  raw  score,  tested  at  AFEES 
Space  Perception  subtest  raw  score,  tested  at  AFEES 
Mathematics  Knowledge  subtest  raw  score,  tested  at  AFEES 
Electronics  Information  subtest  raw  score,  tested  at  AFEES 
Mechanical  Comprehension  subtest  raw  score,  tested  at  AFEES 
General  Science  subtest  raw  score,  tested  at  AFEES 
Shop  Information  subtest  raw  score,  tested  at  AFEES 
Automotive  Information  subtest  raw  score,  tested  at  AFEES 
Mechanical  Interest  subtest  raw  score,  tested  at  AFEES 
Attentiveness  Interest  subtest  raw  score,  tested  at  AFEES 
Electronics  Interest  subtest  raw  score,  tested  at  AFEES 
Combat  Interest  subtest  raw  score,  tested  at  AFEES 
Combat  aptitude  composite  raw  score,  tested  at  Depot 
Field  Artillery  composite  raw  score,  tested  at  Depot 
Mechanical  Maintenance  composite  raw  score,  tested  at  Depot 
General  Maintenance  composite  raw  score,  tested  at  Depot 
Clerical  composite  raw  score,  tested  at  Depot 
General  Technical  composite  raw  score,  tested  at  Depot 
Electronics  Repair  composite  raw  score,  tested  at  Depot 
Surveillance/Communications  composite  raw  score,  tested  at 
Depot 

Skilled  Technical  composite  raw  score,  tested  at  Depot 

Operators/Food  composite  raw  score,  tested  at  Depot 

AFQT  raw  score,  tested  at  Depot 

Written  test,  standardized 

Hands-on  test,  standardized 

Hands-on  efficiency,  standardized 

WRSTAND  plus  FCGSTAND  plus  HOCEFFST,  standardized 
Final  course  grade  (training),  standardized 
WRSTAND  plus  HOCEFFST,  standardized 
WRSTAND  plus  FCGSTAND 


aForms  6  and  7  of  the  ASVAB. 
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TABLE  C-5  (Cont'd) 
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Part  B:  Corrected  Intercorrelation  Matrix 


Standard 


Mean 

deviation 

ARGI 

9.634 

3.200 

ARNO 

30.153 

10.500 

ARAD 

14.008 

4.000 

AR  WK 

18.122 

7.000 

ARAR 

11.733 

4.700 

A  RSP 

13.603 

4.200 

ARMK 

11.191 

4.900 

AR  El 

19.298 

5.700 

ARMC 

10.886 

4.500 

ARGS 

10.496 

4.300 

ARSI 

14.008 

4.200 

AR  A I 

12.267 

4.800 

ARCH 

13.786 

4.200 

ARCA 

9.924 

3.000 

ARCE 

9.061 

4.600 

ARCC 

13.908 

3.300 

ORCO 

71.412 

15.650 

ORFA 

59.870 

16.831 

ORMM 

69.527 

20.920 

DRGM 

45.389 

15.780 

0  RCL 

54.443 

11.646 

ORGT 

29.931 

9.322 

OREL 

51.122 

17.512 

DR  SC 

54.954 

14.538 

DRST 

32.496 

11.424 

DROF 

31.595 

8.441 

ORGCT 

43.916 

10.958 

WRSTANO 

50.235 

11.293 

HOCEFFST 

49.620 

10.567 

PRO  F I C  ST 

50 . 247 

12.647 

FCGSTANO 

50.709 

12.662 

PRQFIC2S 

49.973 

11.334 

WRFCG 

100.944 

21.999 

N  OF  CASES  =  131 
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TABLE  C-6 


INTERCORRELATION  MATRICES  FOR  INFANTRY  RIFLEMAN 
TESTED  WITH  ASVAB  6/7 


Part  A:  Uncorrected  Intercorrelation  Matrix 


Standard 


Mean 

deviation 

ARGI 

9.107 

2.797 

ARNO 

29.979 

3.411 

AR  AO 

15.154 

3.547 

ARWK 

17.900 

4.352 

ARAR 

11.207 

3.213 

ARSP 

12.307 

3.662 

ARM* 

10.100 

4.  044 

A  R  E I 

17.421 

4.655 

ARMC 

9.735 

3.459 

ARGS 

9.350 

3.476 

ARSI 

13.100 

3.559 

A  R  A  I 

10.336 

3.932 

ARCM 

12.036 

4.120 

ARCA 

9.521 

2.930 

ARCE 

3.136 

4. 223 

ARCC 

13.336 

3.575 

op.ca 

69.336 

10.439 

ORFA 

54.150 

11.904 

QRMM 

59.750 

14.936 

ORGM 

40.750 

11.143 

ORCL 

53.100 

9.214 

ORGT 

27.343 

7.524 

OREL 

46.171 

12.005 

OR  SC 

50.493 

11.071 

ORST 

29.743 

3.376 

OROP 

23.093 

6.776 

ORGCT 

40.764 

3.933 

WRSTANO 

49.670 

8.309 

HOST  AND 

49.625 

9.8  49 

TSSTSCOR 

99.294 

15.303 

N  OF  CASES 

i  =  140 
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TABLE  C-6  (Gont f  d) 


Variable 


Description 


ARGI 

ARNO 

ARAD 

ARWK 

ARAR 

ARSP 

ARMK 

AREI 

ARMC 

ARGS 

ARSI 

ARAI 

ARCM 

ARCA 

ARCE 

ARCC 

DRCO 

DRFA 

DRMM 

DRGM 

DRCL 

DRGT 

DREL 

DRSC 

DRST 

DROF 

DRGCT 

WRSTAND 

HO STAND 

TESTSCOR 


General  Information  subtest  raw  score,  tested  at  AFEESa 
Numerical  Operations  subtest  raw  score,  tested  at  AFEES 
Attention  to  Detail  subtest  raw  score,  tested  at  AFEES 
Word  Knowledge  subtest  raw  score,  tested  at  AFEES 
Arithmetic  Reasoning  subtest  raw  score,  tested  at  AFEES 
Space  Perception  subtest  raw  score,  tested  at  AFEES 
Mathematics  Knowledge  subtest  raw  score,  tested  at  AFEES 
Electronics  Information  subtest  raw  score,  tested  at  AFEES 
Mechanical  Comprehension  subtest  raw  score,  tested  at  AFEES 
General  Science  subtest  raw  score,  tested  at  AFEES 
Shop  Information  subtest  raw  score,  tested  at  AFEES 
Automotive  Information  subtest  raw  score,  tested  at  AFEES 
Mechanical  Interest  subtest  raw  score,  tested  at  AFEES 
Attentiveness  Interest  subtest  raw  score,  tested  at  AFEES 
Electronics  Interest  subtest  raw  score,  tested  at  AFEES 
Combat  Interest  subtest  raw  score,  tested  at  AFEES 
Combat  aptitude  composite  raw  score,  tested  at  Depot 
Field  Artillery  composite  raw  score,  tested  at  Depot 
Mechanical  Maintenance  composite  raw  score,  tested  at  Depot 
General  Maintenance  composite  raw  score,  tested  at  Depot 
Clerical  composite  raw  score,  tested  at  Depot 
General  Technical  composite  raw  score,  tested  at  Depot 
Electronics  Repair  composite  raw  score,  tested  at  Depot 
Surveillance/ Communications  composite  raw  score,  tested  at 
Depot 

Skilled  Technical  composite  raw  score,  tested  at  Depot 

Operators/Food  composite  raw  score,  tested  at  Depot 

AFQT  raw  score  tested  at  Depot 

Written  test,  standardized 

Hands-on  test,  standardized 

WRSTAND  plus  HO STAND 


aForms  6  and  7  of  the  ASVAB . 
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TABLE  C-6  (Cont'd) 


Part  B:  Corrected  Intercorrelation  Matrix 


Standard 


Mean 

deviation 

ARGI 

9.107 

3.200 

ARNO 

29.979 

10.500 

ARAO 

15.164 

4.000 

ARWK 

17.900 

7.000 

ARAR 

11.207 

4.700 

ARSP 

12.307 

4.200 

ARMK 

10.100 

4.  900 

AREI 

17.421 

5.700 

ARHC 

9.736 

4.500 

ARGS  • 

9.350 

4.300 

ARSI 

13.100 

4.200 

ARAI 

10.336 

4.800 

ARCM 

12.086 

4.200 

ARCA 

9.521 

3.000 

ARCE 

8.186 

4.  600 

ARCC 

13.386 

3.800 

ORCO 

69.336 

13.514 

ORFA 

54. 150 

17.018 

DRMM 

59.750 

18.633 

ORGH 

40.750 

15.905 

ORCL 

53.100 

13.291 

ORGT 

27.843 

11.577 

OREL 

46.171 

17.065 

DR  SC 

50.493 

17.568 

ORST 

29.743 

12.939 

ORQF 

28.093 

8.695 

DRGCT 

40.764 

14.205 

WRSTANO 

49.670 

11.099 

HOSTANO 

49.625 

11.607 

TESTSCQR 

99.294 

20.389 

N  OF  CASES 

=  140 
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TABLE  C-7 


INTERCORRELATION  MATRICES  FOR  INFANTRY  RIFLEMAN 
TESTED  WITH  ASVAB  8/9/10 


Part  A:  Uncorrected  Intercorrelation  Matrix 


Standard 


Mean 

deviation 

AR8GS 

16.679 

3 .377 

AR8AR 

21.000 

5.339 

AR8WK 

27.925 

4.953 

A  R  8  PC 

11.755 

2.385 

AR8N0 

37.387 

7.963 

AR8CS 

46.033 

11.502 

AR8AS 

17.679 

4.607 

AR8MK  ' 

13.679 

5.026 

AR3MC 

17.793 

4.276 

AR8EI 

13.472 

3.214 

08CL 

102.830 

13.530 

08C  0 

101.226 

17.422 

08EL 

103.566 

15.664 

08FA 

103.641 

17.474 

D8GM 

103.264 

18.232 

0  8GT 

103.283 

16.472 

D8MM 

103.491 

19.048 

WRSTAND 

53.287 

9.691 

HOSTANO 

52.129 

10.121 

TESTSCOR 

105.416 

1  6.  358 

FCGST 

52.355 

9.365 

WRFCG 

105.641 

16.723 

PROFIC 

157.770 

23.116 

N  OF  CASES  =  53 
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TABLE  07  (Cont'd) 


Variable 

AR8GS 

AR8AR 

AR8WK 

AR8PC 

AR8N0 

AR8CS 

AR8AS 

AR8MK 

AR8MC 

AR8EI 

D8CL 

D8C0 

D8EL 

D8FA 

D8GM 

D8GT 

D8MM 

WRSTAND 

HO STAND 

TESTSCOR 

FCGST 

WRFCG 

PROFIC 


aForms  8, 


Description 

General  Science  subtest  raw  score,  tested  at  AFEESa 

Arithmetic  Reasoning  subtest  raw  score,  tested  at  AFEES 

Word  Knowledge  subtest  raw  score,  tested  at  AFEES 

Paragraph  Comprehension  subtest  raw  score,  tested  at  AFEES 

Numerical  Operations  subtest  raw  score,  tested  at  AFEES 

Coding  Speed  subtest  raw  score,  tested  at  AFEES 

Auto/ Shop  Information  subtest  raw  score,  tested  at  AFEES 

Mathematics  Knowledge  subtest  raw  score,  tested  at  AFEES  * 

Mechanical  Comprehension  subtest  raw  score,  tested  at  AFEES 

Electronics  Information  subtest  raw  score,  tested  at  AFEES 

Clerical  aptitude  composite,  tested  at  Depot 

Combat  aptitude  composite,  tested  at  Depot 

Electronics  Repair  aptitude  composite,  tested  at  Depot 

Field  Artillery  aptitude  composite,  tested  at  Depot 

General  Maintenance  aptitude  composite,  tested  at  Depot 

General  Technical  aptitude  composite,  tested  at  Depot 

Mechanical  Maintenance  aptitude  composite,  tested  at  Depot 

Written  test  score,  standardized 

Hands-on  test  score,  standardized 

WRSTAND  plus  HOSTAND 

Final  course  grade  (training),  standardized 

WRSTAND  plus  FCGST 

WRSTAND  plus  HOSTAND  plus  FCGST 


9,  and  10  of  the  ASVAB . 
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TABLE  C-7  (Cont'd) 
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TABLE  C-7  (Coat'd) 


Part  B:  Corrected  Intercorrelation  Matrix 


Standard 


Mean 

deviation 

AR8GS 

16.679 

5.010 

AR8AR 

21.000 

7.370 

AR8WK 

27.925 

7.710 

AR8PC 

11.755 

3.360 

AR8N0 

37.387 

10.990 

AR8CS 

46.038 

16.250 

A  R  8  A  5 

17.679 

5.550 

AR8MK 

13.679 

6.390 

AR8HC 

17.793 

5.350 

AR8EI 

13.472 

4.240 

08CL 

102.830 

23.547 

08C0 

101.226 

22.940 

D8EL 

103.566 

21.872 

08F  A 

103.642 

23.655 

0  8GM 

103.264 

23.621 

0  8GT 

103.283 

23.282 

D8MM 

103.491 

23.330 

WRSTANO 

53.287 

13.476 

HOSTANO 

52.129 

11.624 

TESTSCQR 

105.416 

22.300 

FCGST 

52.355 

10.202 

WRFCG 

105.642 

21.267 

PROFIC 

157.770 

29.341 

N  OF  CASES 

=  53 

i 
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TABLE  C-7  (Cont'd) 
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TABLE  C-7  (Cont'd) 
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TABLE  C-8 


INTERCORRELATION  MATRICES  FOR  COMBINED 
INFANTRY  RIFLEMAN  SAMPLES,  TESTED  WITH 
ASVAB  6,  7,  8,  9,  AND  10 

Part  A:  Uncorrected  Intercorrelation  Matrix 


S  tandard 


Mean 

deviation 

GS 

43.494 

7.346 

A? 

43.357 

7.  546 

W  K 

49.403 

6.498 

NO 

49.363 

3.366 

AS 

49.975 

7.318 

MK 

43.764 

8.  207 

MC 

49.714 

3.507 

51 

43.535 

8.044 

WRSTANO 

O 

rr> 

m 

• 

o 

lO 

9. 160 

HGSTAND 

50.333 

10.160 

TESTSCST 

49.934 

10.000 

C0RECQD3 

35.535 

13.557 

AMGS 

51.443 

16.057 

TIMEIN 

12.979 

9.131 

RANK 

2.104 

.791 

N  OF  CASES  =  241 
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TABLE  C-8  (Cont'd) 


Variable  _ Description 


GS 

General  Science  subtest  standard  score,  ASVAB  forms  6,  7, 

AR 

• 

8 ,  9 ,  and  10 

Arithmetic  Reasoning  subtest  standard  score,  ASVAB  forms 

6,  7,  8,  9,  and  10 

WK 

Word  Knowledge  subtest  standard  score,  ASVAB  forms  6,  7, 

8,  9,  and  10 

NO 

t 

AS 

Numerical  Operations  subtest  standard  score,  ASVAB  forms 

6 ,  7 ,  8 ,  9 ,  and  10 

Auto/Shop  Information  subtest  standard  score,  ASVAB  forms 
6,  7,  8,  9,  and  10 

MK 

Mathematics  Knowledge  subtest  standard  score,  ASVAB  forms 
6,  7,  8,  9,  and  10 

MC 

Mechanical  Comprehension  subtest  standard  score, 

ASVAB  forms  6,  7,  8,  9,  and  10 

El 

Electronics  Information  subtest  standard  score, 

ASVAB  forms  6,  7,  8,  9,  and  10 

WRSTAND 

HO STAND 
TESTSCST 
CORECODE 
AMGS 

TIMEIN 

RANK 

Written  test,  standardized 

Hands-on  test,  standardized 

WRSTAND  plus  H0STAND,  standardized 

Combat  aptitude  composite  standard  score 

AFQT  score,  tested  at  AEEES 

Time  served  in  the  Marine  Corps,  computed 

Enlisted  grade 

* 
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TABLE  C-8  (Cont'd) 


Part  B:  Corrected  Intercorrelation  Matrix 


Mean 

Standard 

deviation 

GS 

41.494 

10.000 

A*  • 

49.357 

10.000 

WK 

49. 4Q3 

10.000 

NO 

49.863 

10.000 

AS 

49.975 

10.090 

MK 

43.764 

10.000 

MC 

49.714 

10.000 

El 

43.535 

10.000 

WRSTAMO 

50.990 

10.926 

HOSTANO 

50.333 

10.941 

TESTSCST 

49.934 

11.633 

CQRECQDE 

95.535 

20.474 

AMGS 

51.443 

23.192 

T I  M  E I N 

12.979 

'  9.400 

RANK 

2.104 

.326 

N  OF  CASES  =  241 
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as  possible  measures  of  job  performance. 

The  ASVAB  was  shown  to  be  a  valid  predictor  of  job  performance.  All  three 
measures — hands-on  tests,  written  tests,  and  training  grades — were  generally 
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^consistent  measures  of  performance*  A  preliminary  set  of  ASVAB  qualification 
standards  for  assigning  recruits  to  these  three  skills  was  computed  using  the 
hands-on  and  written  tests  as  the  criterion  measure.  The  ASVAB  standards 
derived  from  this  analysis  are  similar  to  the  standards  based  on  the  tradi¬ 
tional  criterion  measure  of  training— course  grades.  We  conclude  that  vali¬ 
dating  ASVAB  enlistment  standards  against  job  performance  appears  to  be 
feasible.  Although  job  performance  tests  can  be  used  for  this  purpose,  they 
are  costly  to  develop  and  administer.  Training  grades,  which  are  routinely 
available,  may  serve  as  a  satisfactory  and  economical  proxy  for  them  in  many 
skills . 
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